From 9db670b996416bb1551a1f16c6aece0594f29ecc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Florijan=20Stamenkovi=C4=87?= Date: Tue, 6 Oct 2020 11:17:37 +0200 Subject: [PATCH 01/31] Fix Issue 6207 (#6208) * Regression test for issue 6207 * Fix issue 6207 * Sign contributor agreement * Minor adjustments to test Co-authored-by: Adriane Boyd --- .github/contributors/florijanstamenkovic.md | 106 ++++++++++++++++++++ spacy/tests/regression/test_issue6207.py | 18 ++++ spacy/util.py | 2 +- 3 files changed, 125 insertions(+), 1 deletion(-) create mode 100644 .github/contributors/florijanstamenkovic.md create mode 100644 spacy/tests/regression/test_issue6207.py diff --git a/.github/contributors/florijanstamenkovic.md b/.github/contributors/florijanstamenkovic.md new file mode 100644 index 000000000..65da875b1 --- /dev/null +++ b/.github/contributors/florijanstamenkovic.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Florijan Stamenkovic | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 2020-10-05 | +| GitHub username | florijanstamenkovic | +| Website (optional) | | diff --git a/spacy/tests/regression/test_issue6207.py b/spacy/tests/regression/test_issue6207.py new file mode 100644 index 000000000..3c9c3ce89 --- /dev/null +++ b/spacy/tests/regression/test_issue6207.py @@ -0,0 +1,18 @@ +# coding: utf8 +from __future__ import unicode_literals + +from spacy.util import filter_spans + + +def test_issue6207(en_tokenizer): + doc = en_tokenizer("zero one two three four five six") + + # Make spans + s1 = doc[:4] + s2 = doc[3:6] # overlaps with s1 + s3 = doc[5:7] # overlaps with s2, not s1 + + result = filter_spans((s1, s2, s3)) + assert s1 in result + assert s2 not in result + assert s3 in result diff --git a/spacy/util.py b/spacy/util.py index 923f56b31..735bfc53b 100644 --- a/spacy/util.py +++ b/spacy/util.py @@ -648,7 +648,7 @@ def filter_spans(spans): # Check for end - 1 here because boundaries are inclusive if span.start not in seen_tokens and span.end - 1 not in seen_tokens: result.append(span) - seen_tokens.update(range(span.start, span.end)) + seen_tokens.update(range(span.start, span.end)) result = sorted(result, key=lambda span: span.start) return result From 047fb9f8b8cfe99abc8455aa990fa2c2dd3d4c84 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=A0ar=C5=ABnas=20Navickas?= <606346+zaibacu@users.noreply.github.com> Date: Tue, 6 Oct 2020 12:19:36 +0300 Subject: [PATCH 02/31] Website (Universe): An entry for rita-dsl (#6138) * Create zaibacu.md * Add RITA-DSL entry * Update agreement * Fix formatting --- .github/contributors/zaibacu.md | 106 ++++++++++++++++++++++++++++++++ website/meta/universe.json | 36 +++++++++++ 2 files changed, 142 insertions(+) create mode 100644 .github/contributors/zaibacu.md diff --git a/.github/contributors/zaibacu.md b/.github/contributors/zaibacu.md new file mode 100644 index 000000000..365b89848 --- /dev/null +++ b/.github/contributors/zaibacu.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Šarūnas Navickas | +| Company name (if applicable) | TokenMill | +| Title or role (if applicable) | Data Engineer | +| Date | 2020-09-24 | +| GitHub username | zaibacu | +| Website (optional) | | diff --git a/website/meta/universe.json b/website/meta/universe.json index bd2cff65a..2badbdeb7 100644 --- a/website/meta/universe.json +++ b/website/meta/universe.json @@ -2532,6 +2532,42 @@ "author_links": { "github": "abchapman93" } + }, + { + "id": "rita-dsl", + "title": "RITA DSL", + "slogan": "Domain Specific Language for creating language rules", + "github": "zaibacu/rita-dsl", + "description": "A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format", + "pip": "rita-dsl", + "thumb": "https://raw.githubusercontent.com/zaibacu/rita-dsl/master/docs/assets/logo-100px.png", + "code_language": "python", + "code_example": [ + "import spacy", + "from rita.shortcuts import setup_spacy", + "", + "rules = \"\"\"", + "cuts = {\"fitted\", \"wide-cut\"}", + "lengths = {\"short\", \"long\", \"calf-length\", \"knee-length\"}", + "fabric_types = {\"soft\", \"airy\", \"crinkled\"}", + "fabrics = {\"velour\", \"chiffon\", \"knit\", \"woven\", \"stretch\"}", + "", + "{IN_LIST(cuts)?, IN_LIST(lengths), WORD(\"dress\")}->MARK(\"DRESS_TYPE\")", + "{IN_LIST(lengths), IN_LIST(cuts), WORD(\"dress\")}->MARK(\"DRESS_TYPE\")", + "{IN_LIST(fabric_types)?, IN_LIST(fabrics)}->MARK(\"DRESS_FABRIC\")", + "\"\"\"", + "", + "nlp = spacy.load(\"en\")", + "setup_spacy(nlp, rules_string=rules)", + "r = nlp(\"She was wearing a short wide-cut dress\")", + "print(list([{\"label\": e.label_, \"text\": e.text} for e in r.ents]))" + ], + "category": ["standalone"], + "tags": ["dsl", "language-patterns", "language-rules", "nlp"], + "author": "Šarūnas Navickas", + "author_links": { + "github": "zaibacu" + } } ], From 15ea401b39de37d4037b75be97802fef040e4a16 Mon Sep 17 00:00:00 2001 From: delzac Date: Tue, 6 Oct 2020 21:11:01 +0800 Subject: [PATCH 03/31] Reflect on usage doc that IS_SENT_START attribute exist (#6114) * Reflect on usage doc that IS_SENT_START attribute exist * Create delzac.md --- .github/contributors/delzac.md | 106 ++++++++++++++++++++++ website/docs/usage/rule-based-matching.md | 1 + 2 files changed, 107 insertions(+) create mode 100644 .github/contributors/delzac.md diff --git a/.github/contributors/delzac.md b/.github/contributors/delzac.md new file mode 100644 index 000000000..0fcfe6f2f --- /dev/null +++ b/.github/contributors/delzac.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Matthew Chin | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 2020-09-22 | +| GitHub username | delzac | +| Website (optional) | | diff --git a/website/docs/usage/rule-based-matching.md b/website/docs/usage/rule-based-matching.md index 252aa8c77..7749dab59 100644 --- a/website/docs/usage/rule-based-matching.md +++ b/website/docs/usage/rule-based-matching.md @@ -166,6 +166,7 @@ rule-based matching are: |  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | bool | Token text consists of alphabetic characters, ASCII characters, digits. | |  `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | bool | Token text is in lowercase, uppercase, titlecase. | |  `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | bool | Token is punctuation, whitespace, stop word. | +|  `IS_SENT_START` | bool | Token is start of sentence. | |  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | bool | Token text resembles a number, URL, email. | |  `POS`, `TAG`, `DEP`, `LEMMA`, `SHAPE` | unicode | The token's simple and extended part-of-speech tag, dependency label, lemma, shape. | | `ENT_TYPE` | unicode | The token's entity label. | From c809b2c8e7ba953265d9cdac8bc7a6e8f3489683 Mon Sep 17 00:00:00 2001 From: Nuccy90 <35872837+Nuccy90@users.noreply.github.com> Date: Tue, 6 Oct 2020 15:14:47 +0200 Subject: [PATCH 04/31] Update morph_rules.py (#6102) * Update morph_rules.py Added "dig" and "dej" ("you" in accusative form) * Create Nuccy90.md * Update Nuccy90.md --- .github/contributors/Nuccy90.md | 106 ++++++++++++++++++++++++++++++++ spacy/lang/sv/morph_rules.py | 14 +++++ 2 files changed, 120 insertions(+) create mode 100644 .github/contributors/Nuccy90.md diff --git a/.github/contributors/Nuccy90.md b/.github/contributors/Nuccy90.md new file mode 100644 index 000000000..2d1adb825 --- /dev/null +++ b/.github/contributors/Nuccy90.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Elena Fano | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 2020-09-21 | +| GitHub username | Nuccy90 | +| Website (optional) | | diff --git a/spacy/lang/sv/morph_rules.py b/spacy/lang/sv/morph_rules.py index 77744813f..a131ce49d 100644 --- a/spacy/lang/sv/morph_rules.py +++ b/spacy/lang/sv/morph_rules.py @@ -35,6 +35,20 @@ MORPH_RULES = { "Number": "Sing", "Case": "Nom", }, + "dig": { + LEMMA: PRON_LEMMA, + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Acc", + }, + "dej": { + LEMMA: PRON_LEMMA, + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Acc", + }, "han": { LEMMA: PRON_LEMMA, "PronType": "Prs", From 1a00bff06d7e9632fc5a647265cf70acaea73a6d Mon Sep 17 00:00:00 2001 From: Rahul Gupta Date: Wed, 7 Oct 2020 13:53:32 +0530 Subject: [PATCH 05/31] Hindi: Adds tests for lexical attributes (norm and like_num) (#5829) * Hindi: Adds tests for lexical attributes (norm and like_num) * Signs and sdds the contributor agreement * Add ordinal numbers to be tagged as like_num * Adds alternate pronunciation for 31 and 39 --- .github/contributors/rahul1990gupta.md | 106 +++++++++++++++++++++++ spacy/lang/hi/lex_attrs.py | 111 +++++++++++++++++++++++-- spacy/tests/conftest.py | 5 ++ spacy/tests/lang/hi/__init__.py | 0 spacy/tests/lang/hi/test_lex_attrs.py | 44 ++++++++++ 5 files changed, 260 insertions(+), 6 deletions(-) create mode 100644 .github/contributors/rahul1990gupta.md create mode 100644 spacy/tests/lang/hi/__init__.py create mode 100644 spacy/tests/lang/hi/test_lex_attrs.py diff --git a/.github/contributors/rahul1990gupta.md b/.github/contributors/rahul1990gupta.md new file mode 100644 index 000000000..eab41b3b1 --- /dev/null +++ b/.github/contributors/rahul1990gupta.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Rahul Gupta | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 28 July 2020 | +| GitHub username | rahul1990gupta | +| Website (optional) | | diff --git a/spacy/lang/hi/lex_attrs.py b/spacy/lang/hi/lex_attrs.py index 12666d96a..515dd0be3 100644 --- a/spacy/lang/hi/lex_attrs.py +++ b/spacy/lang/hi/lex_attrs.py @@ -13,23 +13,26 @@ _stem_suffixes = [ ["ाएगी", "ाएगा", "ाओगी", "ाओगे", "एंगी", "ेंगी", "एंगे", "ेंगे", "ूंगी", "ूंगा", "ातीं", "नाओं", "नाएं", "ताओं", "ताएं", "ियाँ", "ियों", "ियां"], ["ाएंगी", "ाएंगे", "ाऊंगी", "ाऊंगा", "ाइयाँ", "ाइयों", "ाइयां"] ] -# fmt: on -# reference 1:https://en.wikipedia.org/wiki/Indian_numbering_system +# reference 1: https://en.wikipedia.org/wiki/Indian_numbering_system # reference 2: https://blogs.transparent.com/hindi/hindi-numbers-1-100/ +# reference 3: https://www.mindurhindi.com/basic-words-and-phrases-in-hindi/ -_num_words = [ +_one_to_ten = [ "शून्य", "एक", "दो", "तीन", "चार", - "पांच", + "पांच", "पाँच", "छह", "सात", "आठ", "नौ", "दस", +] + +_eleven_to_beyond = [ "ग्यारह", "बारह", "तेरह", @@ -40,13 +43,85 @@ _num_words = [ "अठारह", "उन्नीस", "बीस", + "इकीस", "इक्कीस", + "बाईस", + "तेइस", + "चौबीस", + "पच्चीस", + "छब्बीस", + "सताइस", "सत्ताइस", + "अट्ठाइस", + "उनतीस", "तीस", + "इकतीस", "इकत्तीस", + "बतीस", "बत्तीस", + "तैंतीस", + "चौंतीस", + "पैंतीस", + "छतीस", "छत्तीस", + "सैंतीस", + "अड़तीस", + "उनतालीस", "उनत्तीस", "चालीस", + "इकतालीस", + "बयालीस", + "तैतालीस", + "चवालीस", + "पैंतालीस", + "छयालिस", + "सैंतालीस", + "अड़तालीस", + "उनचास", "पचास", + "इक्यावन", + "बावन", + "तिरपन", "तिरेपन", + "चौवन", "चउवन", + "पचपन", + "छप्पन", + "सतावन", "सत्तावन", + "अठावन", + "उनसठ", "साठ", + "इकसठ", + "बासठ", + "तिरसठ", "तिरेसठ", + "चौंसठ", + "पैंसठ", + "छियासठ", + "सड़सठ", + "अड़सठ", + "उनहत्तर", "सत्तर", + "इकहत्तर" + "बहत्तर", + "तिहत्तर", + "चौहत्तर", + "पचहत्तर", + "छिहत्तर", + "सतहत्तर", + "अठहत्तर", + "उन्नासी", "उन्यासी" "अस्सी", + "इक्यासी", + "बयासी", + "तिरासी", + "चौरासी", + "पचासी", + "छियासी", + "सतासी", + "अट्ठासी", + "नवासी", "नब्बे", + "इक्यानवे", + "बानवे", + "तिरानवे", + "चौरानवे", + "पचानवे", + "छियानवे", + "सतानवे", + "अट्ठानवे", + "निन्यानवे", "सौ", "हज़ार", "लाख", @@ -55,6 +130,22 @@ _num_words = [ "खरब", ] +_num_words = _one_to_ten + _eleven_to_beyond + +_ordinal_words_one_to_ten = [ + "प्रथम", "पहला", + "द्वितीय", "दूसरा", + "तृतीय", "तीसरा", + "चौथा", + "पांचवाँ", + "छठा", + "सातवाँ", + "आठवाँ", + "नौवाँ", + "दसवाँ", +] +_ordinal_suffix = "वाँ" +# fmt: on def norm(string): # normalise base exceptions, e.g. punctuation or currency symbols @@ -67,7 +158,7 @@ def norm(string): for suffix_group in reversed(_stem_suffixes): length = len(suffix_group[0]) if len(string) <= length: - break + continue for suffix in suffix_group: if string.endswith(suffix): return string[:-length] @@ -77,7 +168,7 @@ def norm(string): def like_num(text): if text.startswith(("+", "-", "±", "~")): text = text[1:] - text = text.replace(", ", "").replace(".", "") + text = text.replace(",", "").replace(".", "") if text.isdigit(): return True if text.count("/") == 1: @@ -86,6 +177,14 @@ def like_num(text): return True if text.lower() in _num_words: return True + + # check ordinal numbers + # reference: http://www.englishkitab.com/Vocabulary/Numbers.html + if text in _ordinal_words_one_to_ten: + return True + if text.endswith(_ordinal_suffix): + if text[:-len(_ordinal_suffix)] in _eleven_to_beyond: + return True return False diff --git a/spacy/tests/conftest.py b/spacy/tests/conftest.py index bf9851178..8f1ac55cb 100644 --- a/spacy/tests/conftest.py +++ b/spacy/tests/conftest.py @@ -123,6 +123,11 @@ def he_tokenizer(): return get_lang_class("he").Defaults.create_tokenizer() +@pytest.fixture(scope="session") +def hi_tokenizer(): + return get_lang_class("hi").Defaults.create_tokenizer() + + @pytest.fixture(scope="session") def hr_tokenizer(): return get_lang_class("hr").Defaults.create_tokenizer() diff --git a/spacy/tests/lang/hi/__init__.py b/spacy/tests/lang/hi/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/spacy/tests/lang/hi/test_lex_attrs.py b/spacy/tests/lang/hi/test_lex_attrs.py new file mode 100644 index 000000000..e3cfffb89 --- /dev/null +++ b/spacy/tests/lang/hi/test_lex_attrs.py @@ -0,0 +1,44 @@ +# coding: utf-8 +from __future__ import unicode_literals + +import pytest +from spacy.lang.hi.lex_attrs import norm, like_num + + +def test_hi_tokenizer_handles_long_text(hi_tokenizer): + text = """ +ये कहानी 1900 के दशक की है। कौशल्या (स्मिता जयकर) को पता चलता है कि उसका +छोटा बेटा, देवदास (शाहरुख खान) वापस घर आ रहा है। देवदास 10 साल पहले कानून की +पढ़ाई करने के लिए इंग्लैंड गया था। उसके लौटने की खुशी में ये बात कौशल्या अपनी पड़ोस +में रहने वाली सुमित्रा (किरण खेर) को भी बता देती है। इस खबर से वो भी खुश हो जाती है। +""" + tokens = hi_tokenizer(text) + assert len(tokens) == 86 + + +@pytest.mark.parametrize( + "word,word_norm", + [ + ("चलता", "चल"), + ("पढ़ाई", "पढ़"), + ("देती", "दे"), + ("जाती", "ज"), + ("मुस्कुराकर", "मुस्कुर"), + ], +) +def test_hi_norm(word, word_norm): + assert norm(word) == word_norm + + +@pytest.mark.parametrize( + "word", ["१९८७", "1987", "१२,२६७", "उन्नीस", "पाँच", "नवासी", "५/१०"], +) +def test_hi_like_num(word): + assert like_num(word) + + +@pytest.mark.parametrize( + "word", ["पहला", "तृतीय", "निन्यानवेवाँ", "उन्नीस", "तिहत्तरवाँ", "छत्तीसवाँ",], +) +def test_hi_like_num_ordinal_words(word): + assert like_num(word) From b95a11dd959bf614f359c15e087d0fc1e53584cf Mon Sep 17 00:00:00 2001 From: Duygu Altinok Date: Wed, 7 Oct 2020 10:25:37 +0200 Subject: [PATCH 06/31] Ordinal numbers for Turkish (#6142) * minor ordinal number addition * fixed typo * added corresponding lexical test --- spacy/lang/tr/lex_attrs.py | 44 +++++++++++++++++++++++++++++++- spacy/tests/lang/tr/test_text.py | 32 +++++++++++++++++++++++ 2 files changed, 75 insertions(+), 1 deletion(-) create mode 100644 spacy/tests/lang/tr/test_text.py diff --git a/spacy/lang/tr/lex_attrs.py b/spacy/lang/tr/lex_attrs.py index 93f26fc8e..366bda9e7 100644 --- a/spacy/lang/tr/lex_attrs.py +++ b/spacy/lang/tr/lex_attrs.py @@ -35,6 +35,36 @@ _num_words = [ ] +_ordinal_words = [ + "birinci", + "ikinci", + "üçüncü", + "dördüncü", + "beşinci", + "altıncı", + "yedinci", + "sekizinci", + "dokuzuncu", + "onuncu", + "yirminci", + "otuzuncu", + "kırkıncı", + "ellinci", + "altmışıncı", + "yetmişinci", + "sekseninci", + "doksanıncı", + "yüzüncü", + "bininci", + "mliyonuncu", + "milyarıncı", + "trilyonuncu", + "katrilyonuncu", + "kentilyonuncu", +] + +_ordinal_endings = ("inci", "ıncı", "nci", "ncı", "uncu", "üncü") + def like_num(text): if text.startswith(("+", "-", "±", "~")): text = text[1:] @@ -45,8 +75,20 @@ def like_num(text): num, denom = text.split("/") if num.isdigit() and denom.isdigit(): return True - if text.lower() in _num_words: + + text_lower = text.lower() + + #Check cardinal number + if text_lower in _num_words: return True + + #Check ordinal number + if text_lower in _ordinal_words: + return True + if text_lower.endswith(_ordinal_endings): + if text_lower[:-3].isdigit() or text_lower[:-4].isdigit(): + return True + return False diff --git a/spacy/tests/lang/tr/test_text.py b/spacy/tests/lang/tr/test_text.py new file mode 100644 index 000000000..2fe638b5f --- /dev/null +++ b/spacy/tests/lang/tr/test_text.py @@ -0,0 +1,32 @@ +# coding: utf-8 +from __future__ import unicode_literals + +import pytest +from spacy.lang.tr.lex_attrs import like_num + + +@pytest.mark.parametrize( + "word", + [ + "bir", + "iki", + "dört", + "altı", + "milyon", + "100", + "birinci", + "üçüncü", + "beşinci", + "100üncü", + "8inci" + ] +) +def test_tr_lex_attrs_like_number_cardinal_ordinal(word): + assert like_num(word) + + +@pytest.mark.parametrize("word", ["beş", "yedi", "yedinci", "birinci"]) +def test_tr_lex_attrs_capitals(word): + assert like_num(word) + assert like_num(word.upper()) + From 2ce6fc26114c230fc855a28f8cd21749759b0168 Mon Sep 17 00:00:00 2001 From: Duygu Altinok Date: Wed, 7 Oct 2020 10:27:36 +0200 Subject: [PATCH 07/31] Turkish tag map and morph rules addition (#6141) * feat: added turkish tag map * feat: morph rules cconj and sconj * feat: more conjuncts * feat: added popular postpositions * feat: added adverbs * feat: added personal pronouns * feat: added reflexive pronouns * minor: corrected case capital * minor: fixed comma typo * feat: added indef pronouns * feat: added dict iter * fixed comma typo * updated language class with tag map and morph * use default tag map instead * removed tag map --- spacy/lang/tr/__init__.py | 4 + spacy/lang/tr/morph_rules.py | 3905 ++++++++++++++++++++++++++++++++++ 2 files changed, 3909 insertions(+) create mode 100644 spacy/lang/tr/morph_rules.py diff --git a/spacy/lang/tr/__init__.py b/spacy/lang/tr/__init__.py index 2553e7c0f..78a174f15 100644 --- a/spacy/lang/tr/__init__.py +++ b/spacy/lang/tr/__init__.py @@ -3,6 +3,8 @@ from __future__ import unicode_literals from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS from .stop_words import STOP_WORDS +from .lex_attrs import LEX_ATTRS +from .morph_rules import MORPH_RULES from ..tokenizer_exceptions import BASE_EXCEPTIONS from ..norm_exceptions import BASE_NORMS @@ -13,12 +15,14 @@ from ...util import update_exc, add_lookups class TurkishDefaults(Language.Defaults): lex_attr_getters = dict(Language.Defaults.lex_attr_getters) + lex_attr_getters.update(LEX_ATTRS) lex_attr_getters[LANG] = lambda text: "tr" lex_attr_getters[NORM] = add_lookups( Language.Defaults.lex_attr_getters[NORM], BASE_NORMS ) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS) stop_words = STOP_WORDS + morph_rules = MORPH_RULES class Turkish(Language): diff --git a/spacy/lang/tr/morph_rules.py b/spacy/lang/tr/morph_rules.py new file mode 100644 index 000000000..02302c504 --- /dev/null +++ b/spacy/lang/tr/morph_rules.py @@ -0,0 +1,3905 @@ +# coding: utf8 +from __future__ import unicode_literals + +from ...symbols import LEMMA, PRON_LEMMA + +_adverbs = [ + "apansızın", + "aslen", + "aynen", + "ayrıyeten", + "basbayağı", + "başaşağı", + "belki", + "çatkapı", + "demin", + "derhal", + "doyasıya", + "düpedüz", + "ebediyen", + "elbet", + "elbette", + "enikonu", + "epey", + "epeyce", + "epeydir", + "esasen", + "evvela", + "galiba", + "gayet", + "genellikle", + "gerçekten", + "gerisingeri", + "giderayak", + "gitgide", + "gıyaben", + "gözgöze", + "güçbela", + "gündüzleyin", + "güya", + "habire", + "hakikaten", + "hakkaten", + "halen", + "halihazırda", + "harfiyen", + "haricen", + "hasbelkader", + "hemen", + "henüz", + "hep", + "hepten", + "herhalde", + "hiç", + "hükmen", + "ihtiyaten", + "illaki", +"ismen", + "iştiraken", + "izafeten", + "kalben", + "kargatulumba", + "kasten", + "katiyen", + "katiyyen", + "kazara", + "kefaleten", + "kendiliğinden", + "kerhen", + "kesinkes", + "kesinlikle", + "keşke", + "kimileyin", + "külliyen", + "layıkıyla", + "maalesef", + "mahsusçuktan", + "masumane", + "malulen", + "mealen", + "mecazen", + "mecburen", + "muhakkak", + "muhtemelen", + "mutlaka", + "müstacelen", + "müştereken", + "müteakiben", + "naçizane", + "nadiren", + "nakden", + "naklen", + "nazikane", + "nerdeyse", + "neredeyse", + "nispeten", + "nöbetleşe", + "olabildiğince", + "olduğunca", + "ortaklaşa", + "otomatikman", + "öğlenleyin", + "öğleyin", + "öldüresiye", + "ölesiye", + "örfen", + "öyle", + "öylesine", + "özellikle", + "peşinen", + "peşpeşe", + "peyderpey", + "ruhen", + "sadece", + "sahi", + "sahiden", + "salt", + "salimen", + "sanırım", + "sanki", + "sehven", + "senlibenli", + "sereserpe", + "sırf", + "sözgelimi", + "sözgelişi", + "şahsen", + "şakacıktan", + "şeklen", + "şıppadak", + "şimdilik", + "şipşak", + "tahminen", + "takdiren", + "takiben", + "tamamen", + "tamamiyle", + "tedbiren", + "temsilen", + "tepetaklak", + "tercihen", + "tesadüfen", + "tevekkeli", + "tezelden", + "tıbben", + "tıkabasa", + "tıpatıp", + "toptan", + "tümüyle", + "uluorta", + "usulcacık", + "usulen", + "üstünkörü", + "vekaleten", + "vicdanen", + "yalancıktan", + "yavaşçacık", + "yekten", + "yeniden", + "yeterince", + "yine", + "yüzükoyun", + "yüzüstü", + "yüzyüze", + "zaten", + "zımmen", + "zihnen", + "zilzurna" + ] + +_postpositions = [ + "geçe", + "gibi", + "göre", + "ilişkin", + "kadar", + "kala", + "karşın", + "nazaran" + "rağmen", + "üzere" + ] + +_subordinating_conjunctions = [ + "eğer", + "madem", + "mademki", + "şayet" + ] + +_coordinating_conjunctions = [ + "ama", + "hem", + "fakat", + "ila", + "lakin", + "ve", + "veya", + "veyahut" + ] + +MORPH_RULES = { + "ADP": {word: {"POS": "ADP"} for word in _postpositions}, + "ADV": {word: {"POS": "ADV"} for word in _adverbs}, + "SCONJ": {word: {"POS": "SCONJ"} for word in _subordinating_conjunctions}, + "CCONJ": {word: {"POS": "CCONJ"} for word in _coordinating_conjunctions}, + "PRON": { + "bana": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Dat" + }, + "benden": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Abl" + }, + "bende": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Loc" + }, + "beni": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Acc" + }, + "benle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Ins" + }, + "ben": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Nom" + }, + "benim": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Gen" + }, + "benimle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Sing", + "Case": "Ins" + }, + "sana": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Dat" + }, + "senden": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Abl" + }, + "sende": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Loc" + }, + "seni": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Acc" + }, + "senle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Ins" + }, + "sen": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Nom" + }, + "senin": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Gen" + }, + "seninle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Sing", + "Case": "Ins" + }, + "ona": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Sing", + "Case": "Dat" + }, + "ondan": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Sing", + "Case": "Abl" + }, + "onda": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "", + "Number": "", + "Case": "Loc" + }, + "onu": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Sing", + "Case": "Acc" + }, + "onla": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Sing", + "Case": "Ins" + }, + "o": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Sing", + "Case": "Nom" + }, + "onun": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Sing", + "Case": "Gen" + }, + "onunla": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Sing", + "Case": "Ins" + }, + "bize": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Dat" + }, + "bizden": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Abl" + }, + "bizde": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Loc" + }, + "bizi": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Acc" + }, + "bizle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Ins" + }, + "biz": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Nom" + }, + "bizim": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Gen" + }, + "bizimle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "One", + "Number": "Plur", + "Case": "Ins" + }, + "size": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Dat" + }, + "sizden": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Abl" + }, + "sizde": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Loc" + }, + "sizi": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Acc" + }, + "sizle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Ins" + }, + "siz": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Nom" + }, + "sizin": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Gen" + }, + "sizinle": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "PronType": "Prs", + "Person": "Two", + "Number": "Plur", + "Case": "Ins" + }, + "onlara": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Plur", + "Case": "Dat" + }, + "onlardan": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Plur", + "Case": "Abl" + }, + "onlarda": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Plur", + "Case": "Loc" + }, + "onları": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Plur", + "Case": "Acc" + }, + "onlarla": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Plur", + "Case": "Ins" + }, + "onlar": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Plur", + "Case": "Nom" + }, + "onların": { + "LEMMA": "PRON_LEMMA", + "POS": "PRON", + "Person": "Three", + "Number": "Plur", + "Case": "Gen" + }, + "buna": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Dat" + }, + "bundan": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Abl" + }, + "bunda": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Loc" + }, + "bunu": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Acc" + }, + "bunla": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Ins" + }, + "bu": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Nom" + }, + "bunun": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Gen" + }, + "bununla": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Ins" + }, + "şuna": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Dat" + }, + "şundan": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Abl" + }, + "şunda": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Loc" + }, + "şunu": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Acc" + }, + "şunla": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Ins" + }, + "şu": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Nom" + }, + "şunun": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Gen" + }, + "şununla": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Ins" + }, + "bunlara": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Dat" + }, + "bunlardan": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Abl" + }, + "bunlarda": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Loc" + }, + "bunları": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Acc" + }, + "bunlarla": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Ins" + }, + "bunlar": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Nom" + }, + "bunların": { + "LEMMA": "bu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Gen" + }, + "şunlara": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Dat" + }, + "şunlardan": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Abl" + }, + "şunlarda": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Loc" + }, + "şunları": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Acc" + }, + "şunlarla": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Ins" + }, + "şunlar": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Nom" + }, + "şunların": { + "LEMMA": "şu", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Gen" + }, + "buraya": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Dat" + }, + "buradan": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Abl" + }, + "burada": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "loc.sg" + }, + "burayı": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Acc" + }, + "burayla": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Ins" + }, + "bura": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Nom" + }, + "buranın": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Gen" + }, + "şuraya": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Dat" + }, + "şuradan": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Abl" + }, + "şurada": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "loc.sg" + }, + "şurayı": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Acc" + }, + "şurayla": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Ins" + }, + "şura": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Nom" + }, + "şuranın": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Gen" + }, + "oraya": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Dat" + }, + "oradan": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Abl" + }, + "orada": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "loc.sg" + }, + "orayı": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Acc" + }, + "orayla": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Ins" + }, + "ora": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Nom" + }, + "oranın": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Sing", + "Case": "Gen" + }, + "buralarına": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Dat" + }, + "buralarından": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Abl" + }, + "buralarında": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Loc" + }, + "buralarını": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Acc" + }, + "buralarıyla": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Ins" + }, + "buraları": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Nom" + }, + "buralarının": { + "LEMMA": "bura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Gen" + }, + "şuralarına": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Dat" + }, + "şuralarından": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Abl" + }, + "şuralarında": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Loc" + }, + "şuralarını": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Acc" + }, + "şuralarıyla": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Ins" + }, + "şuraları": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Nom" + }, + "şuralarının": { + "LEMMA": "şura", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Gen" + }, + "oralarına": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Dat" + }, + "oralarından": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Abl" + }, + "oralarında": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Loc" + }, + "oralarını": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Acc" + }, + "oralarıyla": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Ins" + }, + "oraları": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Nom" + }, + "oralarının": { + "LEMMA": "ora", + "POS": "PRON", + "PronType": "Dem", + "Number": "Plur", + "Case": "Gen" + }, + "kendime": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Dat", + "Number": "Sing" + }, + "kendimden": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Abl", + "Number": "Sing" + }, + "kendimde": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Loc", + "Number": "Sing" + }, + "kendimi": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Acc", + "Number": "Sing" + }, + "kendimle": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Ins", + "Number": "Sing" + }, + "kendim": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Nom", + "Number": "Sing" + }, + "kendimin": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Gen", + "Number": "Sing" + }, + "kendine": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Dat", + "Number": "Sing" + }, + "kendinden": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Abl", + "Number": "Sing" + }, + "kendinde": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Loc", + "Number": "Sing" + }, + "kendini": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Acc", + "Number": "Sing" + }, + "kendiyle": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Ins", + "Number": "Sing" + }, + "kendi": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Nom", + "Number": "Sing" + }, + "kendinin": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Gen", + "Number": "Sing" + }, + "kendisine": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Dat", + "Number": "Sing" + }, + "kendisinden": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Abl", + "Number": "Sing" + }, + "kendisinde": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Loc", + "Number": "Sing" + }, + "kendisini": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Acc", + "Number": "Sing" + }, + "kendisiyle": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Ins", + "Number": "Sing" + }, + "kendisi": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Nom", + "Number": "Sing" + }, + "kendisinin": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Gen", + "Number": "Sing" + }, + "kendimize": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Dat", + "Number": "Sing" + }, + "kendimizden": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Abl", + "Number": "Sing" + }, + "kendimizde": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Loc", + "Number": "Sing" + }, + "kendimizi": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Acc", + "Number": "Sing" + }, + "kendimizle": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Ins", + "Number": "Sing" + }, + "kendimiz": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Nom", + "Number": "Sing" + }, + "kendimizin": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "One", + "Case": "Gen", + "Number": "Sing" + }, + "kendinize": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Dat", + "Number": "Sing" + }, + "kendinizden": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Abl", + "Number": "Sing" + }, + "kendinizde": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Loc", + "Number": "Sing" + }, + "kendinizi": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Acc", + "Number": "Sing" + }, + "kendinizle": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Ins", + "Number": "Sing" + }, + "kendiniz": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Nom", + "Number": "Sing" + }, + "kendinizin": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Two", + "Case": "Gen", + "Number": "Sing" + }, + "kendilerine": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Dat", + "Number": "Sing" + }, + "kendilerinden": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Abl", + "Number": "Sing" + }, + "kendilerinde": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Loc", + "Number": "Sing" + }, + "kendilerini": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Acc", + "Number": "Sing" + }, + "kendileriyle": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Ins", + "Number": "Sing" + }, + "kendileriyken": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Nom", + "Number": "Sing" + }, + "kendilerinin": { + "LEMMA": "kendi", + "POS": "PRON", + "PronType": "Prs", + "Reflex": "Yes", + "Person": "Three", + "Case": "Gen", + "Number": "Sing" + }, + "hangilerine": { + "LEMMA": "hangileri", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Sing" + }, + "hangilerinden": { + "LEMMA": "hangileri", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Sing" + }, + "hangilerinde": { + "LEMMA": "hangileri", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Sing" + }, + "hangilerini": { + "LEMMA": "hangileri", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Sing" + }, + "hangileriyle": { + "LEMMA": "hangileri", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Sing" + }, + "hangileri": { + "LEMMA": "hangileri", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Sing" + }, + "hangilerinin": { + "LEMMA": "hangileri", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Sing" + }, + "hangisine": { + "LEMMA": "hangi", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Sing" + }, + "hangisinden": { + "LEMMA": "hangi", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Sing" + }, + "hangisinde": { + "LEMMA": "hangi", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Sing" + }, + "hangisini": { + "LEMMA": "hangi", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Sing" + }, + "hangisiyle": { + "LEMMA": "hangi", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Sing" + }, + "hangisi": { + "LEMMA": "hangi", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Sing" + }, + "hangisinin": { + "LEMMA": "hangi", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Sing" + }, + "kime": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Sing" + }, + "kimden": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Sing" + }, + "kimde": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Sing" + }, + "kimi": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Sing" + }, + "kimle": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Sing" + }, + "kim": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Sing" + }, + "kimin": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Sing" + }, + "kimlere": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Plur" + }, + "kimlerden": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Plur" + }, + "kimlerde": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Plur" + }, + "kimleri": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Plur" + }, + "kimlerle": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Plur" + }, + "kimler": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Plur" + }, + "kimlerin": { + "LEMMA": "kim", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Plur" + }, + "neye": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Sing" + }, + "neden": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Sing" + }, + "nede": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Sing" + }, + "neyi": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Sing" + }, + "neyle": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Sing" + }, + "ne": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Sing" + }, + "neyin": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Sing" + }, + "nelere": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Plur" + }, + "nelerden": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Plur" + }, + "nelerde": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Plur" + }, + "neleri": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Plur" + }, + "nelerle": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Plur" + }, + "neler": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Plur" + }, + "nelerin": { + "LEMMA": "ne", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Plur" + }, + "nereye": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Sing" + }, + "nereden": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Sing" + }, + "nerede": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Sing" + }, + "nereyi": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Sing" + }, + "nereyle": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Sing" + }, + "nere": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Sing" + }, + "nerenin": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Sing" + }, + "nerelere": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Plur" + }, + "nerelerden": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Plur" + }, + "nerelerde": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Plur" + }, + "nereleri": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Plur" + }, + "nerelerle": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Plur" + }, + "nereler": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Plur" + }, + "nerelerin": { + "LEMMA": "nere", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Plur" + }, + "kaçlarına": { + "LEMMA": "kaçları", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Sing" + }, + "kaçlarından": { + "LEMMA": "kaçları", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Sing" + }, + "kaçlarında": { + "LEMMA": "kaçları", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Sing" + }, + "kaçlarını": { + "LEMMA": "kaçları", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Sing" + }, + "kaçlarıyla": { + "LEMMA": "kaçları", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Sing" + }, + "kaçları": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Sing" + }, + "kaçlarının": { + "LEMMA": "kaçları", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Sing" + }, + "kaçına": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Dat", + "Number": "Sing" + }, + "kaçından": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Abl", + "Number": "Sing" + }, + "kaçında": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Loc", + "Number": "Sing" + }, + "kaçını": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Acc", + "Number": "Sing" + }, + "kaçıyla": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Ins", + "Number": "Sing" + }, + "kaçı": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Nom", + "Number": "Sing" + }, + "kaçının": { + "LEMMA": "kaçı", + "POS": "PRON", + "PronType": "Int", + "Case": "Gen", + "Number": "Sing" + }, + "başkasına": { + "LEMMA": "başkası", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "başkasından": { + "LEMMA": "başkası", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "başkasında": { + "LEMMA": "başkası", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "başkasını": { + "LEMMA": "başkası", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "başkasıyla": { + "LEMMA": "başkası", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "başkası": { + "LEMMA": "başkası", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "başkasının": { + "LEMMA": "başkası", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "başkalarına": { + "LEMMA": "başkaları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "başkalarından": { + "LEMMA": "başkaları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "başkalarında": { + "LEMMA": "başkaları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "başkalarını": { + "LEMMA": "başkaları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "başkalarıyla": { + "LEMMA": "başkaları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "başkaları": { + "LEMMA": "başkaları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "başkalarının": { + "LEMMA": "başkaları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "bazısına": { + "LEMMA": "bazısı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "bazısından": { + "LEMMA": "bazısı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "bazısında": { + "LEMMA": "bazısı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "bazısını": { + "LEMMA": "bazısı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "bazısıyla": { + "LEMMA": "bazısı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "bazısı": { + "LEMMA": "bazısı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "bazısının": { + "LEMMA": "bazısı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "bazılarına": { + "LEMMA": "bazıları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "bazılarından": { + "LEMMA": "bazıları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "bazılarında": { + "LEMMA": "bazıları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "bazılarını": { + "LEMMA": "bazıları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "bazılarıyla": { + "LEMMA": "bazıları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "bazıları": { + "LEMMA": "bazıları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "bazılarının": { + "LEMMA": "bazıları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "birbirine": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Dat", + "Number": "Sing" + }, + "birbirinden": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Abl", + "Number": "Sing" + }, + "birbirinde": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Loc", + "Number": "Sing" + }, + "birbirini": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Acc", + "Number": "Sing" + }, + "birbiriyle": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Ins", + "Number": "Sing" + }, + "birbiri": { + "LEMMA": "birbiri", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Nom", + "Number": "Sing" + }, + "birbirinin": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Gen", + "Number": "Sing" + }, + "birbirlerine": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Dat", + "Number": "Sing" + }, + "birbirlerinden": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Abl", + "Number": "Sing" + }, + "birbirlerinde": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Loc", + "Number": "Sing" + }, + "birbirlerini": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Acc", + "Number": "Sing" + }, + "birbirleriyle": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Ins", + "Number": "Sing" + }, + "birbirleri": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Nom", + "Number": "Sing" + }, + "birbirlerinin": { + "LEMMA": "birbir", + "POS": "PRON", + "PronType": "Rcp", + "Case": "Gen", + "Number": "Sing" + }, + "birçoğuna": { + "LEMMA": "birçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "birçoğundan": { + "LEMMA": "birçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "birçoğunda": { + "LEMMA": "birçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "birçoğunu": { + "LEMMA": "birçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "birçoğuyla": { + "LEMMA": "birçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "birçoğu": { + "LEMMA": "birçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "birçoğunun": { + "LEMMA": "birçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "birçoklarına": { + "LEMMA": "birçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "birçoklarından": { + "LEMMA": "birçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "birçoklarında": { + "LEMMA": "birçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "birçoklarını": { + "LEMMA": "birçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "birçoklarıyla": { + "LEMMA": "birçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "birçokları": { + "LEMMA": "birçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "birçoklarının": { + "LEMMA": "birçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "birilerine": { + "LEMMA": "birileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "birilerinden": { + "LEMMA": "birileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "birilerinde": { + "LEMMA": "birileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "birilerini": { + "LEMMA": "birileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "birileriyle": { + "LEMMA": "birileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "birileri": { + "LEMMA": "birileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "birilerinin": { + "LEMMA": "birileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "birisine": { + "LEMMA": "biri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "birisinden": { + "LEMMA": "biri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "birisinde": { + "LEMMA": "biri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "birisini": { + "LEMMA": "biri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "birisiyle": { + "LEMMA": "biri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "birisi": { + "LEMMA": "biri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "birisinin": { + "LEMMA": "biri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "birkaçına": { + "LEMMA": "birkaçı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "birkaçından": { + "LEMMA": "birkaçı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "birkaçında": { + "LEMMA": "birkaçı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "birkaçını": { + "LEMMA": "birkaçı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "birkaçıyla": { + "LEMMA": "birkaçı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "birkaçı": { + "LEMMA": "birkaç", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "birkaçının": { + "LEMMA": "birkaçı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "birtakımına": { + "LEMMA": "birtakımı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "birtakımından": { + "LEMMA": "birtakımı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "birtakımında": { + "LEMMA": "birtakımı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "birtakımını": { + "LEMMA": "birtakımı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "birtakımıyla": { + "LEMMA": "birtakımı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "birtakımı": { + "LEMMA": "birtakımı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "birtakımının": { + "LEMMA": "birtakımı", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "böylesine": { + "LEMMA": "böylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "böylesinden": { + "LEMMA": "böylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "böylesinde": { + "LEMMA": "böylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "böylesini": { + "LEMMA": "böylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "böylesiyle": { + "LEMMA": "böylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "böylesi": { + "LEMMA": "böylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "böylesinin": { + "LEMMA": "böylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "şöylesine": { + "LEMMA": "şöylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "şöylesinden": { + "LEMMA": "şöylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "şöylesinde": { + "LEMMA": "şöylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "şöylesini": { + "LEMMA": "şöylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "şöylesiyle": { + "LEMMA": "şöylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "şöylesi": { + "LEMMA": "şöylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "şöylesinin": { + "LEMMA": "şöylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "öylesine": { + "LEMMA": "öylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "öylesinden": { + "LEMMA": "öylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "öylesinde": { + "LEMMA": "öylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "öylesini": { + "LEMMA": "öylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "öylesiyle": { + "LEMMA": "öylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "öylesi": { + "LEMMA": "öylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "öylesinin": { + "LEMMA": "öylesi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "böylelerine": { + "LEMMA": "böyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "böylelerinden": { + "LEMMA": "böyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "böylelerinde": { + "LEMMA": "böyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "böylelerini": { + "LEMMA": "böyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "böyleleriyle": { + "LEMMA": "böyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "böyleleri": { + "LEMMA": "böyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "böylelerinin": { + "LEMMA": "böyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "şöylelerine": { + "LEMMA": "şöyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "şöylelerinden": { + "LEMMA": "şöyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "şöylelerinde": { + "LEMMA": "şöyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "şöylelerini": { + "LEMMA": "şöyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "şöyleleriyle": { + "LEMMA": "şöyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "şöyleleri": { + "LEMMA": "şöyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "şöylelerinin": { + "LEMMA": "şöyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "öylelerine": { + "LEMMA": "öyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "öylelerinden": { + "LEMMA": "öyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "öylelerinde": { + "LEMMA": "öyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "öylelerini": { + "LEMMA": "öyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "öyleleriyle": { + "LEMMA": "öyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "öyleleri": { + "LEMMA": "öyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "öylelerinin": { + "LEMMA": "öyleleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "çoklarına": { + "LEMMA": "çokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "çoklarından": { + "LEMMA": "çokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "çoklarında": { + "LEMMA": "çokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "çoklarını": { + "LEMMA": "çokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "çoklarıyla": { + "LEMMA": "çokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "çokları": { + "LEMMA": "çokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "çoklarının": { + "LEMMA": "çokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "çoğuna": { + "LEMMA": "çoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "çoğundan": { + "LEMMA": "çoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "çoğunda": { + "LEMMA": "çoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "çoğunu": { + "LEMMA": "çoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "çoğuyla": { + "LEMMA": "çoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "çoğu": { + "LEMMA": "çoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "çoğunun": { + "LEMMA": "çoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "diğerine": { + "LEMMA": "diğeri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "diğerinden": { + "LEMMA": "diğeri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "diğerinde": { + "LEMMA": "diğeri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "diğerini": { + "LEMMA": "diğeri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "diğeriyle": { + "LEMMA": "diğeri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "diğeri": { + "LEMMA": "diğer", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "diğerinin": { + "LEMMA": "diğeri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "diğerlerine": { + "LEMMA": "diğerleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "diğerlerinden": { + "LEMMA": "diğerleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "diğerlerinde": { + "LEMMA": "diğerleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "diğerlerini": { + "LEMMA": "diğerleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "diğerleriyle": { + "LEMMA": "diğerleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "diğerleri": { + "LEMMA": "diğerleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "diğerlerinin": { + "LEMMA": "diğerleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "hepinize": { + "LEMMA": "hepiniz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "hepinizden": { + "LEMMA": "hepiniz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "hepinizde": { + "LEMMA": "hepiniz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "hepinizi": { + "LEMMA": "hepiniz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "hepinizle": { + "LEMMA": "hepiniz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "hepiniz": { + "LEMMA": "hepiniz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "hepinizin": { + "LEMMA": "hepiniz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "hepimize": { + "LEMMA": "hepimiz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "hepimizden": { + "LEMMA": "hepimiz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "hepimizde": { + "LEMMA": "hepimiz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "hepimizi": { + "LEMMA": "hepimiz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "hepimizle": { + "LEMMA": "hepimiz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "hepimiz": { + "LEMMA": "hepimiz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "hepimizin": { + "LEMMA": "hepimiz", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "hepsine": { + "LEMMA": "hepsi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "hepsinden": { + "LEMMA": "hepsi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "hepsinde": { + "LEMMA": "hepsi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "hepsini": { + "LEMMA": "hepsi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "hepsiyle": { + "LEMMA": "hepsi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "hepsi": { + "LEMMA": "hepsi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "hepsinin": { + "LEMMA": "hepsi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "herbirine": { + "LEMMA": "herbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "herbirinden": { + "LEMMA": "herbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "herbirinde": { + "LEMMA": "herbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "herbirini": { + "LEMMA": "herbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "herbiriyle": { + "LEMMA": "herbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "herbiri": { + "LEMMA": "herbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "herbirinin": { + "LEMMA": "herbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "herbirlerine": { + "LEMMA": "herbirleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "herbirlerinden": { + "LEMMA": "herbirleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "herbirlerinde": { + "LEMMA": "herbirleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "herbirlerini": { + "LEMMA": "herbirleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "herbirleriyle": { + "LEMMA": "herbirleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "herbirleri": { + "LEMMA": "herbirleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "herbirlerinin": { + "LEMMA": "herbirleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "herhangisine": { + "LEMMA": "herhangisi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "herhangisinden": { + "LEMMA": "herhangisi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "herhangisinde": { + "LEMMA": "herhangisi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "herhangisini": { + "LEMMA": "herhangisi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "herhangisiyle": { + "LEMMA": "herhangisi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "herhangisi": { + "LEMMA": "herhangisi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "herhangisinin": { + "LEMMA": "herhangisi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "herhangilerine": { + "LEMMA": "herhangileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "herhangilerinden": { + "LEMMA": "herhangileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "herhangilerinde": { + "LEMMA": "herhangileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "herhangilerini": { + "LEMMA": "herhangileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "herhangileriyle": { + "LEMMA": "herhangileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "herhangileri": { + "LEMMA": "herhangileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "herhangilerinin": { + "LEMMA": "herhangileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "herkese": { + "LEMMA": "herkes", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "herkesten": { + "LEMMA": "herkes", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "herkeste": { + "LEMMA": "herkes", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "herkesi": { + "LEMMA": "herkes", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "herkesle": { + "LEMMA": "herkes", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "herkes": { + "LEMMA": "herkes", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "herkesin": { + "LEMMA": "herkes", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "hiçbirisine": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "hiçbirisinden": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "hiçbirisinde": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "hiçbirisini": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "hiçbirisiyle": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "hiçbirisi": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "hiçbirisinin": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "hiçbirine": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "hiçbirinden": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "hiçbirinde": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "hiçbirini": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "hiçbiriyle": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "hiçbiri": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "hiçbirinin": { + "LEMMA": "hiçbiri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "kimisine": { + "LEMMA": "kimi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "kimisinden": { + "LEMMA": "kimi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "kimisinde": { + "LEMMA": "kimi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "kimisini": { + "LEMMA": "kimi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "kimisiyle": { + "LEMMA": "kimi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "kimisi": { + "LEMMA": "kimi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "kimisinin": { + "LEMMA": "kimi", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "kimilerine": { + "LEMMA": "kimileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "kimilerinden": { + "LEMMA": "kimileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "kimilerinde": { + "LEMMA": "kimileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "kimilerini": { + "LEMMA": "kimileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "kimileriyle": { + "LEMMA": "kimileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "kimileri": { + "LEMMA": "kimileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "kimilerinin": { + "LEMMA": "kimileri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "kimseye": { + "LEMMA": "kimse", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "kimseden": { + "LEMMA": "kimse", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "kimsede": { + "LEMMA": "kimse", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "kimseyi": { + "LEMMA": "kimse", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "kimseyle": { + "LEMMA": "kimse", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "kimse": { + "LEMMA": "kimse", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "kimsenin": { + "LEMMA": "kimse", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "öbürüne": { + "LEMMA": "öbürü", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "öbüründen": { + "LEMMA": "öbürü", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "öbüründe": { + "LEMMA": "öbürü", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "öbürünü": { + "LEMMA": "öbürü", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "öbürüyle": { + "LEMMA": "öbürü", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "öbürü": { + "LEMMA": "öbürü", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "öbürünün": { + "LEMMA": "öbürü", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "öbürlerine": { + "LEMMA": "öbürleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "öbürlerinden": { + "LEMMA": "öbürleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "öbürlerinde": { + "LEMMA": "öbürleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "öbürlerini": { + "LEMMA": "öbürleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "öbürleriyle": { + "LEMMA": "öbürleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "öbürleri": { + "LEMMA": "öbürleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "öbürlerinin": { + "LEMMA": "öbürleri", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "ötekisine": { + "LEMMA": "öteki", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "ötekisinden": { + "LEMMA": "öteki", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "ötekisinde": { + "LEMMA": "öteki", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "ötekisini": { + "LEMMA": "öteki", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "ötekisiyle": { + "LEMMA": "öteki", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "ötekisi": { + "LEMMA": "öteki", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "ötekisinin": { + "LEMMA": "öteki", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "pekçoğuna": { + "LEMMA": "pekçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "pekçoğundan": { + "LEMMA": "pekçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "pekçoğunda": { + "LEMMA": "pekçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "pekçoğunu": { + "LEMMA": "pekçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "pekçoğuyla": { + "LEMMA": "pekçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "pekçoğu": { + "LEMMA": "pekçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "pekçoğunun": { + "LEMMA": "pekçoğu", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + }, + "pekçoklarına": { + "LEMMA": "pekçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Dat", + "Number": "Sing" + }, + "pekçoklarından": { + "LEMMA": "pekçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Abl", + "Number": "Sing" + }, + "pekçoklarında": { + "LEMMA": "pekçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Loc", + "Number": "Sing" + }, + "pekçoklarını": { + "LEMMA": "pekçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Acc", + "Number": "Sing" + }, + "pekçoklarıyla": { + "LEMMA": "pekçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Ins", + "Number": "Sing" + }, + "pekçokları": { + "LEMMA": "pekçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Nom", + "Number": "Sing" + }, + "pekçoklarının": { + "LEMMA": "pekçokları", + "POS": "PRON", + "PronType": "Ind", + "Case": "Gen", + "Number": "Sing" + } + } + } + +for tag, rules in MORPH_RULES.items(): + for key, attrs in dict(rules).items(): + rules[key.title()] = attrs From 7e821c2776fb22fe6e52af558832c732bb3fa2f2 Mon Sep 17 00:00:00 2001 From: Duygu Altinok Date: Wed, 7 Oct 2020 11:07:52 +0200 Subject: [PATCH 08/31] Turkish language syntax iterators (#6191) * added tr_vocab to config * basic test * added syntax iterator to Turkish lang class * first version for Turkish syntax iter, without flat * added simple tests with nmod, amod, det * more tests to amod and nmod * separated noun chunks and parser test * rearrangement after nchunk parser separation * added recursive NPs * tests with complicated recursive NPs * tests with conjed NPs * additional tests for conj NP * small modification for shaving off conj from NP * added tests with flat * more tests with flat * added examples with flats conjed * added inner func for flat trick * corrected parse Co-authored-by: Adriane Boyd --- spacy/lang/tr/__init__.py | 3 + spacy/lang/tr/syntax_iterators.py | 59 +++ spacy/tests/conftest.py | 3 + spacy/tests/lang/tr/test_noun_chunks.py | 16 + spacy/tests/lang/tr/test_parser.py | 573 ++++++++++++++++++++++++ 5 files changed, 654 insertions(+) create mode 100644 spacy/lang/tr/syntax_iterators.py create mode 100644 spacy/tests/lang/tr/test_noun_chunks.py create mode 100644 spacy/tests/lang/tr/test_parser.py diff --git a/spacy/lang/tr/__init__.py b/spacy/lang/tr/__init__.py index 78a174f15..fb0883a68 100644 --- a/spacy/lang/tr/__init__.py +++ b/spacy/lang/tr/__init__.py @@ -3,9 +3,11 @@ from __future__ import unicode_literals from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS from .stop_words import STOP_WORDS +from .syntax_iterators import SYNTAX_ITERATORS from .lex_attrs import LEX_ATTRS from .morph_rules import MORPH_RULES + from ..tokenizer_exceptions import BASE_EXCEPTIONS from ..norm_exceptions import BASE_NORMS from ...language import Language @@ -22,6 +24,7 @@ class TurkishDefaults(Language.Defaults): ) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS) stop_words = STOP_WORDS + syntax_iterators = SYNTAX_ITERATORS morph_rules = MORPH_RULES diff --git a/spacy/lang/tr/syntax_iterators.py b/spacy/lang/tr/syntax_iterators.py new file mode 100644 index 000000000..6cab3b260 --- /dev/null +++ b/spacy/lang/tr/syntax_iterators.py @@ -0,0 +1,59 @@ +# coding: utf8 +from __future__ import unicode_literals + +from ...symbols import NOUN, PROPN, PRON +from ...errors import Errors + + +def noun_chunks(doclike): + """ + Detect base noun phrases from a dependency parse. Works on both Doc and Span. + """ + # Please see documentation for Turkish NP structure + labels = [ + "nsubj", + "iobj", + "obj", + "obl", + "appos", + "orphan", + "dislocated", + "ROOT", + ] + doc = doclike.doc # Ensure works on both Doc and Span. + if not doc.is_parsed: + raise ValueError(Errors.E029) + + np_deps = [doc.vocab.strings.add(label) for label in labels] + conj = doc.vocab.strings.add("conj") + flat = doc.vocab.strings.add("flat") + np_label = doc.vocab.strings.add("NP") + + def extend_right(w): # Playing a trick for flat + rindex = w.i + 1 + for rdep in doc[w.i].rights: # Extend the span to right if there is a flat + if rdep.dep == flat and rdep.pos in (NOUN, PROPN): + rindex = rdep.i + 1 + else: + break + return rindex + + prev_end = len(doc) + 1 + for i, word in reversed(list(enumerate(doclike))): + if word.pos not in (NOUN, PROPN, PRON): + continue + # Prevent nested chunks from being produced + if word.i >= prev_end: + continue + if word.dep in np_deps: + prev_end = word.left_edge.i + yield word.left_edge.i, extend_right(word), np_label + elif word.dep == conj: + cc_token = word.left_edge + prev_end = cc_token.i + yield cc_token.right_edge.i + 1, extend_right(word), np_label # Shave off cc tokens from the NP + + + + +SYNTAX_ITERATORS = {"noun_chunks": noun_chunks} diff --git a/spacy/tests/conftest.py b/spacy/tests/conftest.py index 8f1ac55cb..dc742ce30 100644 --- a/spacy/tests/conftest.py +++ b/spacy/tests/conftest.py @@ -242,6 +242,9 @@ def th_tokenizer(): def tr_tokenizer(): return get_lang_class("tr").Defaults.create_tokenizer() +@pytest.fixture(scope="session") +def tr_vocab(): + return get_lang_class("tr").Defaults.create_vocab() @pytest.fixture(scope="session") def tt_tokenizer(): diff --git a/spacy/tests/lang/tr/test_noun_chunks.py b/spacy/tests/lang/tr/test_noun_chunks.py new file mode 100644 index 000000000..98a1f355f --- /dev/null +++ b/spacy/tests/lang/tr/test_noun_chunks.py @@ -0,0 +1,16 @@ +# coding: utf-8 +from __future__ import unicode_literals + +import pytest + + +def test_noun_chunks_is_parsed(tr_tokenizer): + """Test that noun_chunks raises Value Error for 'tr' language if Doc is not parsed. + To check this test, we're constructing a Doc + with a new Vocab here and forcing is_parsed to 'False' + to make sure the noun chunks don't run. + """ + doc = tr_tokenizer("Dün seni gördüm.") + doc.is_parsed = False + with pytest.raises(ValueError): + list(doc.noun_chunks) diff --git a/spacy/tests/lang/tr/test_parser.py b/spacy/tests/lang/tr/test_parser.py new file mode 100644 index 000000000..707b0183d --- /dev/null +++ b/spacy/tests/lang/tr/test_parser.py @@ -0,0 +1,573 @@ +# coding: utf-8 +from __future__ import unicode_literals + +from ...util import get_doc + + +def test_tr_noun_chunks_amod_simple(tr_tokenizer): + text = "sarı kedi" + heads = [1, 0] + deps = ["amod", "ROOT"] + tags = ["ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "sarı kedi " + + +def test_tr_noun_chunks_nmod_simple(tr_tokenizer): + text = "arkadaşımın kedisi" # my friend's cat + heads = [1, 0] + deps = ["nmod", "ROOT"] + tags = ["NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "arkadaşımın kedisi " + + +def test_tr_noun_chunks_determiner_simple(tr_tokenizer): + text = "O kedi" # that cat + heads = [1, 0] + deps = ["det", "ROOT"] + tags = ["DET", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "O kedi " + + +def test_tr_noun_chunks_nmod_amod(tr_tokenizer): + text = "okulun eski müdürü" + heads = [2, 1, 0] + deps = ["nmod", "amod", "ROOT"] + tags = ["NOUN", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "okulun eski müdürü " + + +def test_tr_noun_chunks_one_det_one_adj_simple(tr_tokenizer): + text = "O sarı kedi" + heads = [2, 1, 0] + deps = ["det", "amod", "ROOT"] + tags = ["DET", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "O sarı kedi " + + +def test_tr_noun_chunks_two_adjs_simple(tr_tokenizer): + text = "beyaz tombik kedi" + heads = [2, 1, 0] + deps = ["amod", "amod", "ROOT"] + tags = ["ADJ", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "beyaz tombik kedi " + + +def test_tr_noun_chunks_one_det_two_adjs_simple(tr_tokenizer): + text = "o beyaz tombik kedi" + heads = [3, 2, 1, 0] + deps = ["det", "amod", "amod", "ROOT"] + tags = ["DET", "ADJ", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "o beyaz tombik kedi " + + +def test_tr_noun_chunks_nmod_two(tr_tokenizer): + text = "kızın saçının rengi" + heads = [1, 1, 0] + deps = ["nmod", "nmod", "ROOT"] + tags = ["NOUN", "NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "kızın saçının rengi " + + +def test_tr_noun_chunks_chain_nmod_with_adj(tr_tokenizer): + text = "ev sahibinin tatlı köpeği" + heads = [1, 2, 1, 0] + deps = ["nmod", "nmod", "amod", "ROOT"] + tags = ["NOUN", "NOUN", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "ev sahibinin tatlı köpeği " + + +def test_tr_noun_chunks_chain_nmod_with_acl(tr_tokenizer): + text = "ev sahibinin gelen köpeği" + heads = [1, 2, 1, 0] + deps = ["nmod", "nmod", "acl", "ROOT"] + tags = ["NOUN", "NOUN", "VERB", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "ev sahibinin gelen köpeği " + + +def test_tr_noun_chunks_chain_nmod_head_with_amod_acl(tr_tokenizer): + text = "arabanın kırdığım sol aynası" + heads = [3, 2, 1, 0] + deps = ["nmod", "acl", "amod", "ROOT"] + tags = ["NOUN", "VERB", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "arabanın kırdığım sol aynası " + + +def test_tr_noun_chunks_nmod_three(tr_tokenizer): + text = "güney Afrika ülkelerinden Mozambik" + heads = [1, 1, 1, 0] + deps = ["nmod", "nmod", "nmod", "ROOT"] + tags = ["NOUN", "PROPN", "NOUN", "PROPN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "güney Afrika ülkelerinden Mozambik " + + +def test_tr_noun_chunks_det_amod_nmod(tr_tokenizer): + text = "bazı eski oyun kuralları" + heads = [3, 2, 1, 0] + deps = ["det", "nmod", "nmod", "ROOT"] + tags = ["DET", "ADJ", "NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "bazı eski oyun kuralları " + + +def test_tr_noun_chunks_acl_simple(tr_tokenizer): + text = "bahçesi olan okul" + heads = [2, -1, 0] + deps = ["acl", "cop", "ROOT"] + tags = ["NOUN", "AUX", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "bahçesi olan okul " + + +def test_tr_noun_chunks_acl_verb(tr_tokenizer): + text = "sevdiğim sanatçılar" + heads = [1, 0] + deps = ["acl", "ROOT"] + tags = ["VERB", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "sevdiğim sanatçılar " + + +def test_tr_noun_chunks_acl_nmod(tr_tokenizer): + text = "en sevdiğim ses sanatçısı" + heads = [1, 2, 1, 0] + deps = ["advmod", "acl", "nmod", "ROOT"] + tags = ["ADV", "VERB", "NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "en sevdiğim ses sanatçısı " + + +def test_tr_noun_chunks_acl_nmod(tr_tokenizer): + text = "bildiğim bir turizm şirketi" + heads = [3, 2, 1, 0] + deps = ["acl", "det", "nmod", "ROOT"] + tags = ["VERB", "DET", "NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "bildiğim bir turizm şirketi " + + +def test_tr_noun_chunks_np_recursive_nsubj_to_root(tr_tokenizer): + text = "Simge'nin okuduğu kitap" + heads = [1, 1, 0] + deps = ["nsubj", "acl", "ROOT"] + tags = ["PROPN", "VERB", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Simge'nin okuduğu kitap " + + +def test_tr_noun_chunks_np_recursive_nsubj_attached_to_pron_root(tr_tokenizer): + text = "Simge'nin konuşabileceği birisi" + heads = [1, 1, 0] + deps = ["nsubj", "acl", "ROOT"] + tags = ["PROPN", "VERB", "PRON"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Simge'nin konuşabileceği birisi " + + +def test_tr_noun_chunks_np_recursive_nsubj_in_subnp(tr_tokenizer): + text = "Simge'nin yarın gideceği yer" + heads = [2, 1, 1, 0] + deps = ["nsubj", "obl", "acl", "ROOT"] + tags = ["PROPN", "NOUN", "VERB", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Simge'nin yarın gideceği yer " + + +def test_tr_noun_chunks_np_recursive_two_nmods(tr_tokenizer): + text = "ustanın kapısını degiştireceği çamasır makinası" + heads = [2, 1, 2, 1, 0] + deps = ["nsubj", "obj", "acl", "nmod", "ROOT"] + tags = ["NOUN", "NOUN", "VERB", "NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "ustanın kapısını degiştireceği çamasır makinası " + + +def test_tr_noun_chunks_np_recursive_four_nouns(tr_tokenizer): + text = "kızına piyano dersi verdiğim hanım" + heads = [3, 1, 1, 1, 0] + deps = ["obl", "nmod", "obj", "acl", "ROOT"] + tags = ["NOUN", "NOUN", "NOUN", "VERB", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "kızına piyano dersi verdiğim hanım " + + +def test_tr_noun_chunks_np_recursive_no_nmod(tr_tokenizer): + text = "içine birkaç çiçek konmuş olan bir vazo" + heads = [3, 1, 1, 3, -1, 1, 0] + deps = ["obl", "det", "nsubj", "acl", "aux", "det", "ROOT"] + tags = ["ADP", "DET", "NOUN", "VERB", "AUX", "DET", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "içine birkaç çiçek konmuş olan bir vazo " + + +def test_tr_noun_chunks_np_recursive_long_two_acls(tr_tokenizer): + text = "içine Simge'nin bahçesinden toplanmış birkaç çiçeğin konmuş olduğu bir vazo" + heads = [6, 1, 1, 2, 1, 1, 3, -1, 1, 0] + deps = ["obl", "nmod" , "obl", "acl", "det", "nsubj", "acl", "aux", "det", "ROOT"] + tags = ["ADP", "PROPN", "NOUN", "VERB", "DET", "NOUN", "VERB", "AUX", "DET", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "içine Simge'nin bahçesinden toplanmış birkaç çiçeğin konmuş olduğu bir vazo " + + +def test_tr_noun_chunks_two_nouns_in_nmod(tr_tokenizer): + text = "kız ve erkek çocuklar" + heads = [3, 1, -2, 0] + deps = ["nmod", "cc", "conj", "ROOT"] + tags = ["NOUN", "CCONJ", "NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "kız ve erkek çocuklar " + +def test_tr_noun_chunks_two_nouns_in_nmod(tr_tokenizer): + text = "tatlı ve gürbüz çocuklar" + heads = [3, 1, -2, 0] + deps = ["amod", "cc", "conj", "ROOT"] + tags = ["ADJ", "CCONJ", "NOUN", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "tatlı ve gürbüz çocuklar " + + +def test_tr_noun_chunks_conj_simple(tr_tokenizer): + text = "Sen ya da ben" + heads = [0, 2, -1, -3] + deps = ["ROOT", "cc", "fixed", "conj"] + tags = ["PRON", "CCONJ", "CCONJ", "PRON"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 2 + assert chunks[0].text_with_ws == "ben " + assert chunks[1].text_with_ws == "Sen " + +def test_tr_noun_chunks_conj_three(tr_tokenizer): + text = "sen, ben ve ondan" + heads = [0, 1, -2, 1, -4] + deps = ["ROOT", "punct", "conj", "cc", "conj"] + tags = ["PRON", "PUNCT", "PRON", "CCONJ", "PRON"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 3 + assert chunks[0].text_with_ws == "ondan " + assert chunks[1].text_with_ws == "ben " + assert chunks[2].text_with_ws == "sen " + + +def test_tr_noun_chunks_conj_three(tr_tokenizer): + text = "ben ya da sen ya da onlar" + heads = [0, 2, -1, -3, 2, -1, -3] + deps = ["ROOT", "cc", "fixed", "conj", "cc", "fixed", "conj"] + tags = ["PRON", "CCONJ", "CCONJ", "PRON", "CCONJ", "CCONJ", "PRON"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 3 + assert chunks[0].text_with_ws == "onlar " + assert chunks[1].text_with_ws == "sen " + assert chunks[2].text_with_ws == "ben " + + +def test_tr_noun_chunks_conj_and_adj_phrase(tr_tokenizer): + text = "ben ve akıllı çocuk" + heads = [0, 2, 1, -3] + deps = ["ROOT", "cc", "amod", "conj"] + tags = ["PRON", "CCONJ", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 2 + assert chunks[0].text_with_ws == "akıllı çocuk " + assert chunks[1].text_with_ws == "ben " + + +def test_tr_noun_chunks_conj_fixed_adj_phrase(tr_tokenizer): + text = "ben ya da akıllı çocuk" + heads = [0, 3, -1, 1, -4] + deps = ["ROOT", "cc", "fixed", "amod", "conj"] + tags = ["PRON", "CCONJ", "CCONJ", "ADJ", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 2 + assert chunks[0].text_with_ws == "akıllı çocuk " + assert chunks[1].text_with_ws == "ben " + + +def test_tr_noun_chunks_conj_subject(tr_tokenizer): + text = "Sen ve ben iyi anlaşıyoruz" + heads = [4, 1, -2, -1, 0] + deps = ["nsubj", "cc", "conj", "adv", "ROOT"] + tags = ["PRON", "CCONJ", "PRON", "ADV", "VERB"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 2 + assert chunks[0].text_with_ws == "ben " + assert chunks[1].text_with_ws == "Sen " + + +def test_tr_noun_chunks_conj_noun_head_verb(tr_tokenizer): + text = "Simge babasını görmüyormuş, annesini değil" + heads = [2, 1, 0, 1, -2, -1] + deps = ["nsubj", "obj", "ROOT", "punct", "conj", "aux"] + tags = ["PROPN", "NOUN", "VERB", "PUNCT", "NOUN", "AUX"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 3 + assert chunks[0].text_with_ws == "annesini " + assert chunks[1].text_with_ws == "babasını " + assert chunks[2].text_with_ws == "Simge " + + +def test_tr_noun_chunks_flat_simple(tr_tokenizer): + text = "New York" + heads = [0, -1] + deps = ["ROOT", "flat"] + tags = ["PROPN", "PROPN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "New York " + + +def test_tr_noun_chunks_flat_names_and_title(tr_tokenizer): + text = "Gazi Mustafa Kemal" + heads = [1, 0, -1] + deps = ["nmod", "ROOT", "flat"] + tags = ["PROPN", "PROPN", "PROPN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Gazi Mustafa Kemal " + + +def test_tr_noun_chunks_flat_names_and_title(tr_tokenizer): + text = "Ahmet Vefik Paşa" + heads = [2, -1, 0] + deps = ["nmod", "flat", "ROOT"] + tags = ["PROPN", "PROPN", "PROPN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Ahmet Vefik Paşa " + + +def test_tr_noun_chunks_flat_name_lastname_and_title(tr_tokenizer): + text = "Cumhurbaşkanı Ahmet Necdet Sezer" + heads = [1, 0, -1, -2] + deps = ["nmod", "ROOT", "flat", "flat"] + tags = ["NOUN", "PROPN", "PROPN", "PROPN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Cumhurbaşkanı Ahmet Necdet Sezer " + + +def test_tr_noun_chunks_flat_in_nmod(tr_tokenizer): + text = "Ahmet Sezer adında bir ögrenci" + heads = [2, -1, 2, 1, 0] + deps = ["nmod", "flat", "nmod", "det", "ROOT"] + tags = ["PROPN", "PROPN", "NOUN", "DET", "NOUN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Ahmet Sezer adında bir ögrenci " + + +def test_tr_noun_chunks_flat_and_chain_nmod(tr_tokenizer): + text = "Batı Afrika ülkelerinden Sierra Leone" + heads = [1, 1, 1, 0, -1] + deps = ["nmod", "nmod", "nmod", "ROOT", "flat"] + tags = ["NOUN", "PROPN", "NOUN", "PROPN", "PROPN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 1 + assert chunks[0].text_with_ws == "Batı Afrika ülkelerinden Sierra Leone " + + +def test_tr_noun_chunks_two_flats_conjed(tr_tokenizer): + text = "New York ve Sierra Leone" + heads = [0, -1, 1, -3, -1] + deps = ["ROOT", "flat", "cc", "conj", "flat"] + tags = ["PROPN", "PROPN", "CCONJ", "PROPN", "PROPN"] + tokens = tr_tokenizer(text) + doc = get_doc( + tokens.vocab, words=[t.text for t in tokens], tags=tags, heads=heads, deps=deps + ) + chunks = list(doc.noun_chunks) + assert len(chunks) == 2 + assert chunks[0].text_with_ws == "Sierra Leone " + assert chunks[1].text_with_ws == "New York " From 9fc8392b3827d009e7b54197e84f2d5b15e72cdf Mon Sep 17 00:00:00 2001 From: Wannaphong Phatthiyaphaibun Date: Wed, 7 Oct 2020 16:12:01 +0700 Subject: [PATCH 09/31] Add Thai tag map (LST20 Corpus) (#6163) * Add Thai tag map (LST20 Corpus) By @korakot * Update tag_map.py * Update tag_map.py * Update tag_map.py --- spacy/lang/th/tag_map.py | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/spacy/lang/th/tag_map.py b/spacy/lang/th/tag_map.py index 119a2f6a0..3c0d3479b 100644 --- a/spacy/lang/th/tag_map.py +++ b/spacy/lang/th/tag_map.py @@ -16,25 +16,33 @@ TAG_MAP = { "CMTR": {POS: NOUN}, "CFQC": {POS: NOUN}, "CVBL": {POS: NOUN}, + "CL": {POS: NOUN}, + "FX": {POS: NOUN}, + "NN": {POS: NOUN}, # VERB "VACT": {POS: VERB}, "VSTA": {POS: VERB}, + "VV": {POS: VERB}, # PRON "PRON": {POS: PRON}, "NPRP": {POS: PRON}, + "PR": {POS: PRON}, # ADJ "ADJ": {POS: ADJ}, "NONM": {POS: ADJ}, "VATT": {POS: ADJ}, "DONM": {POS: ADJ}, + "AJ": {POS: ADJ}, # ADV "ADV": {POS: ADV}, "ADVN": {POS: ADV}, "ADVI": {POS: ADV}, "ADVP": {POS: ADV}, "ADVS": {POS: ADV}, - # INT + "AV": {POS: ADV}, + # INTJ "INT": {POS: INTJ}, + "IJ": {POS: INTJ}, # PRON "PROPN": {POS: PROPN}, "PPRS": {POS: PROPN}, @@ -56,6 +64,7 @@ TAG_MAP = { "NCNM": {POS: NUM}, "NLBL": {POS: NUM}, "DCNM": {POS: NUM}, + "NU": {POS: NUM}, # AUX "AUX": {POS: AUX}, "XVBM": {POS: AUX}, @@ -63,12 +72,15 @@ TAG_MAP = { "XVMM": {POS: AUX}, "XVBB": {POS: AUX}, "XVAE": {POS: AUX}, + "AX": {POS: AUX}, # ADP "ADP": {POS: ADP}, "RPRE": {POS: ADP}, + "PS": {POS: ADP}, # CCONJ "CCONJ": {POS: CCONJ}, "JCRG": {POS: CCONJ}, + "CC": {POS: CCONJ}, # SCONJ "SCONJ": {POS: SCONJ}, "PREL": {POS: SCONJ}, @@ -82,6 +94,7 @@ TAG_MAP = { "AITT": {POS: PART}, "NEG": {POS: PART}, "EITT": {POS: PART}, + "PA": {POS: PART}, # PUNCT "PUNCT": {POS: PUNCT}, "PUNC": {POS: PUNCT}, From 2998131416a57ff27ca050e44a2722a336a1c088 Mon Sep 17 00:00:00 2001 From: Sofie Van Landeghem Date: Thu, 8 Oct 2020 00:43:46 +0200 Subject: [PATCH 10/31] Reproducibility for TextCat and Tok2Vec (#6218) * ensure fixed seed in HashEmbed layers * forgot about the joys of python 2 --- spacy/_ml.py | 8 +++--- spacy/ml/_legacy_tok2vec.py | 8 +++--- spacy/ml/tok2vec.py | 8 +++--- spacy/tests/regression/test_issue6177.py | 35 ++++++++++++++++++++++++ 4 files changed, 47 insertions(+), 12 deletions(-) create mode 100644 spacy/tests/regression/test_issue6177.py diff --git a/spacy/_ml.py b/spacy/_ml.py index d947aab1c..3fc2c4718 100644 --- a/spacy/_ml.py +++ b/spacy/_ml.py @@ -654,10 +654,10 @@ def build_text_classifier(nr_class, width=64, **cfg): ) return model - lower = HashEmbed(width, nr_vector, column=1) - prefix = HashEmbed(width // 2, nr_vector, column=2) - suffix = HashEmbed(width // 2, nr_vector, column=3) - shape = HashEmbed(width // 2, nr_vector, column=4) + lower = HashEmbed(width, nr_vector, column=1, seed=10) + prefix = HashEmbed(width // 2, nr_vector, column=2, seed=11) + suffix = HashEmbed(width // 2, nr_vector, column=3, seed=12) + shape = HashEmbed(width // 2, nr_vector, column=4, seed=13) trained_vectors = FeatureExtracter( [ORTH, LOWER, PREFIX, SUFFIX, SHAPE, ID] diff --git a/spacy/ml/_legacy_tok2vec.py b/spacy/ml/_legacy_tok2vec.py index 3e41b1c6a..c4291b5d6 100644 --- a/spacy/ml/_legacy_tok2vec.py +++ b/spacy/ml/_legacy_tok2vec.py @@ -27,16 +27,16 @@ def Tok2Vec(width, embed_size, **kwargs): bilstm_depth = kwargs.get("bilstm_depth", 0) cols = [ID, NORM, PREFIX, SUFFIX, SHAPE, ORTH] with Model.define_operators({">>": chain, "|": concatenate, "**": clone}): - norm = HashEmbed(width, embed_size, column=cols.index(NORM), name="embed_norm") + norm = HashEmbed(width, embed_size, column=cols.index(NORM), name="embed_norm", seed=6) if subword_features: prefix = HashEmbed( - width, embed_size // 2, column=cols.index(PREFIX), name="embed_prefix" + width, embed_size // 2, column=cols.index(PREFIX), name="embed_prefix", seed=7 ) suffix = HashEmbed( - width, embed_size // 2, column=cols.index(SUFFIX), name="embed_suffix" + width, embed_size // 2, column=cols.index(SUFFIX), name="embed_suffix", seed=8 ) shape = HashEmbed( - width, embed_size // 2, column=cols.index(SHAPE), name="embed_shape" + width, embed_size // 2, column=cols.index(SHAPE), name="embed_shape", seed=9 ) else: prefix, suffix, shape = (None, None, None) diff --git a/spacy/ml/tok2vec.py b/spacy/ml/tok2vec.py index 8f86475ef..6949d83e2 100644 --- a/spacy/ml/tok2vec.py +++ b/spacy/ml/tok2vec.py @@ -42,16 +42,16 @@ def MultiHashEmbed(config): width = config["width"] rows = config["rows"] - norm = HashEmbed(width, rows, column=cols.index("NORM"), name="embed_norm") + norm = HashEmbed(width, rows, column=cols.index("NORM"), name="embed_norm", seed=1) if config["use_subwords"]: prefix = HashEmbed( - width, rows // 2, column=cols.index("PREFIX"), name="embed_prefix" + width, rows // 2, column=cols.index("PREFIX"), name="embed_prefix", seed=2 ) suffix = HashEmbed( - width, rows // 2, column=cols.index("SUFFIX"), name="embed_suffix" + width, rows // 2, column=cols.index("SUFFIX"), name="embed_suffix", seed=3 ) shape = HashEmbed( - width, rows // 2, column=cols.index("SHAPE"), name="embed_shape" + width, rows // 2, column=cols.index("SHAPE"), name="embed_shape", seed=4 ) if config.get("@pretrained_vectors"): glove = make_layer(config["@pretrained_vectors"]) diff --git a/spacy/tests/regression/test_issue6177.py b/spacy/tests/regression/test_issue6177.py new file mode 100644 index 000000000..c806011c3 --- /dev/null +++ b/spacy/tests/regression/test_issue6177.py @@ -0,0 +1,35 @@ +# coding: utf8 +from __future__ import unicode_literals + +from spacy.lang.en import English +from spacy.util import fix_random_seed + + +def test_issue6177(): + """Test that after fixing the random seed, the results of the pipeline are truly identical""" + + # NOTE: no need to transform this code to v3 when 'master' is merged into 'develop'. + # A similar test exists already for v3: test_issue5551 + # This is just a backport + + results = [] + for i in range(3): + fix_random_seed(0) + nlp = English() + example = ( + "Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.", + {"cats": {"Labe1": 1.0, "Label2": 0.0, "Label3": 0.0}}, + ) + textcat = nlp.create_pipe("textcat") + nlp.add_pipe(textcat) + for label in set(example[1]["cats"]): + textcat.add_label(label) + nlp.begin_training() + # Store the result of each iteration + result = textcat.model.predict([nlp.make_doc(example[0])]) + results.append(list(result[0])) + + # All results should be the same because of the fixed seed + assert len(results) == 3 + assert results[0] == results[1] + assert results[0] == results[2] \ No newline at end of file From 241cd112f57bbac63146f9561f77ef9f790527b3 Mon Sep 17 00:00:00 2001 From: Sofie Van Landeghem Date: Thu, 8 Oct 2020 00:44:16 +0200 Subject: [PATCH 11/31] add reenabled pipe names back to the meta before serializing (#6219) --- spacy/cli/train.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/spacy/cli/train.py b/spacy/cli/train.py index 0614c7519..0a640d909 100644 --- a/spacy/cli/train.py +++ b/spacy/cli/train.py @@ -204,7 +204,7 @@ def train( "positive_label": textcat_positive_label, } if pipe not in nlp.pipe_names: - msg.text("Adding component to base model '{}'".format(pipe)) + msg.text("Adding component to base model: '{}'".format(pipe)) nlp.add_pipe(nlp.create_pipe(pipe, config=pipe_cfg)) pipes_added = True elif replace_components: @@ -574,6 +574,7 @@ def train( best_pipes = nlp.pipe_names if disabled_pipes: disabled_pipes.restore() + meta["pipeline"] = nlp.pipe_names with nlp.use_params(optimizer.averages): final_model_path = output_path / "model-final" nlp.to_disk(final_model_path) From 81afe9b19df2c29b4c29062cf1b47582a69ded01 Mon Sep 17 00:00:00 2001 From: Baranitharan Date: Thu, 8 Oct 2020 08:17:25 +0530 Subject: [PATCH 12/31] Update examples.py --- spacy/lang/ta/examples.py | 1 + 1 file changed, 1 insertion(+) diff --git a/spacy/lang/ta/examples.py b/spacy/lang/ta/examples.py index c34e77129..3aa72ddcc 100644 --- a/spacy/lang/ta/examples.py +++ b/spacy/lang/ta/examples.py @@ -23,4 +23,5 @@ sentences = [ "தன்னாட்சி கார்கள் காப்பீட்டு பொறுப்பை உற்பத்தியாளரிடம் மாற்றுகின்றன", "நடைபாதை விநியோக ரோபோக்களை தடை செய்வதை சான் பிரான்சிஸ்கோ கருதுகிறது", "லண்டன் ஐக்கிய இராச்சியத்தில் ஒரு பெரிய நகரம்." + "என்ன வேலை செய்கிறீர்கள்?" ] From d6037c18609e8bd3a94d705cddb126a6aac8cd34 Mon Sep 17 00:00:00 2001 From: Baranitharan Date: Thu, 8 Oct 2020 08:22:58 +0530 Subject: [PATCH 13/31] added sentence --- spacy/lang/ta/examples.py | 1 + 1 file changed, 1 insertion(+) diff --git a/spacy/lang/ta/examples.py b/spacy/lang/ta/examples.py index 3aa72ddcc..fd1ca8729 100644 --- a/spacy/lang/ta/examples.py +++ b/spacy/lang/ta/examples.py @@ -24,4 +24,5 @@ sentences = [ "நடைபாதை விநியோக ரோபோக்களை தடை செய்வதை சான் பிரான்சிஸ்கோ கருதுகிறது", "லண்டன் ஐக்கிய இராச்சியத்தில் ஒரு பெரிய நகரம்." "என்ன வேலை செய்கிறீர்கள்?" + "எந்த கல்லூரியில் படிக்கிறாய்?" ] From 7f92a5ee6abb4a3186dfa809c6889756d4bac29a Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Tue, 13 Oct 2020 11:03:35 +0200 Subject: [PATCH 14/31] Update spacy/lang/ta/examples.py --- spacy/lang/ta/examples.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spacy/lang/ta/examples.py b/spacy/lang/ta/examples.py index fd1ca8729..4700e0c7f 100644 --- a/spacy/lang/ta/examples.py +++ b/spacy/lang/ta/examples.py @@ -22,7 +22,7 @@ sentences = [ "ஆப்பிள் நிறுவனம் யு.கே. தொடக்க நிறுவனத்தை ஒரு லட்சம் கோடிக்கு வாங்கப் பார்க்கிறது", "தன்னாட்சி கார்கள் காப்பீட்டு பொறுப்பை உற்பத்தியாளரிடம் மாற்றுகின்றன", "நடைபாதை விநியோக ரோபோக்களை தடை செய்வதை சான் பிரான்சிஸ்கோ கருதுகிறது", - "லண்டன் ஐக்கிய இராச்சியத்தில் ஒரு பெரிய நகரம்." - "என்ன வேலை செய்கிறீர்கள்?" + "லண்டன் ஐக்கிய இராச்சியத்தில் ஒரு பெரிய நகரம்.", + "என்ன வேலை செய்கிறீர்கள்?", "எந்த கல்லூரியில் படிக்கிறாய்?" ] From c23041ae6006fdcf9d942df928a5b08b0ae1c781 Mon Sep 17 00:00:00 2001 From: svlandeg Date: Tue, 13 Oct 2020 16:26:53 +0200 Subject: [PATCH 15/31] component tests single or multiple prediction --- spacy/tests/pipeline/test_models.py | 46 +++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 spacy/tests/pipeline/test_models.py diff --git a/spacy/tests/pipeline/test_models.py b/spacy/tests/pipeline/test_models.py new file mode 100644 index 000000000..d1c877953 --- /dev/null +++ b/spacy/tests/pipeline/test_models.py @@ -0,0 +1,46 @@ +from typing import List +import pytest +from numpy.testing import assert_equal +from thinc.api import get_current_ops, Model, data_validation +from thinc.types import Array2d + +from spacy.lang.en import English +from spacy.tokens import Doc + +OPS = get_current_ops() + +texts = ["These are 4 words", "These just three"] +l0 = [[1, 2], [3, 4], [5, 6], [7, 8]] +l1 = [[9, 8], [7, 6], [5, 4]] +out_list = [OPS.xp.asarray(l0, dtype="f"), OPS.xp.asarray(l1, dtype="f")] +a1 = OPS.xp.asarray(l1, dtype="f") + +# Test components with a model of type Model[List[Doc], List[Floats2d]] +@pytest.mark.parametrize("name", ["tagger", "tok2vec", "morphologizer", "senter"]) +def test_layers_batching_all_list(name): + nlp = English() + in_data = [nlp(text) for text in texts] + proc = nlp.create_pipe(name) + util_batch_unbatch_List(proc.model, in_data, out_list) + +def util_batch_unbatch_List(model: Model[List[Doc], List[Array2d]], in_data: List[Doc], out_data: List[Array2d]): + with data_validation(True): + model.initialize(in_data, out_data) + Y_batched = model.predict(in_data) + Y_not_batched = [model.predict([u])[0] for u in in_data] + assert_equal(Y_batched, Y_not_batched) + +# Test components with a model of type Model[List[Doc], Floats2d] +@pytest.mark.parametrize("name", ["textcat"]) +def test_layers_batching_all_array(name): + nlp = English() + in_data = [nlp(text) for text in texts] + proc = nlp.create_pipe(name) + util_batch_unbatch_Array(proc.model, in_data, a1) + +def util_batch_unbatch_Array(model: Model[List[Doc], Array2d], in_data: List[Doc], out_data: Array2d): + with data_validation(True): + model.initialize(in_data, out_data) + Y_batched = model.predict(in_data) + Y_not_batched = [model.predict([u])[0] for u in in_data] + assert_equal(Y_batched, Y_not_batched) \ No newline at end of file From 6ccacff54e4c279c0b37652119bd507ee466a5df Mon Sep 17 00:00:00 2001 From: svlandeg Date: Tue, 13 Oct 2020 18:50:07 +0200 Subject: [PATCH 16/31] add tests for individual spacy layers --- spacy/tests/pipeline/test_models.py | 98 +++++++++++++++++++++++------ 1 file changed, 80 insertions(+), 18 deletions(-) diff --git a/spacy/tests/pipeline/test_models.py b/spacy/tests/pipeline/test_models.py index d1c877953..12de9d23e 100644 --- a/spacy/tests/pipeline/test_models.py +++ b/spacy/tests/pipeline/test_models.py @@ -1,46 +1,108 @@ from typing import List + +import numpy import pytest -from numpy.testing import assert_equal +from numpy.testing import assert_almost_equal +from spacy.vocab import Vocab from thinc.api import get_current_ops, Model, data_validation -from thinc.types import Array2d +from thinc.types import Array2d, Ragged from spacy.lang.en import English +from spacy.ml import FeatureExtractor, StaticVectors +from spacy.ml._character_embed import CharacterEmbed from spacy.tokens import Doc OPS = get_current_ops() -texts = ["These are 4 words", "These just three"] +texts = ["These are 4 words", "Here just three"] l0 = [[1, 2], [3, 4], [5, 6], [7, 8]] l1 = [[9, 8], [7, 6], [5, 4]] -out_list = [OPS.xp.asarray(l0, dtype="f"), OPS.xp.asarray(l1, dtype="f")] -a1 = OPS.xp.asarray(l1, dtype="f") +list_floats = [OPS.xp.asarray(l0, dtype="f"), OPS.xp.asarray(l1, dtype="f")] +list_ints = [OPS.xp.asarray(l0, dtype="i"), OPS.xp.asarray(l1, dtype="i")] +array = OPS.xp.asarray(l1, dtype="f") +ragged = Ragged(array, OPS.xp.asarray([2, 1], dtype="i")) + + +def get_docs(): + vocab = Vocab() + for t in texts: + for word in t.split(): + hash_id = vocab.strings.add(word) + vector = numpy.random.uniform(-1, 1, (7,)) + vocab.set_vector(hash_id, vector) + docs = [English(vocab)(t) for t in texts] + return docs + # Test components with a model of type Model[List[Doc], List[Floats2d]] @pytest.mark.parametrize("name", ["tagger", "tok2vec", "morphologizer", "senter"]) -def test_layers_batching_all_list(name): +def test_components_batching_list(name): nlp = English() - in_data = [nlp(text) for text in texts] proc = nlp.create_pipe(name) - util_batch_unbatch_List(proc.model, in_data, out_list) + util_batch_unbatch_List(proc.model, get_docs(), list_floats) -def util_batch_unbatch_List(model: Model[List[Doc], List[Array2d]], in_data: List[Doc], out_data: List[Array2d]): - with data_validation(True): - model.initialize(in_data, out_data) - Y_batched = model.predict(in_data) - Y_not_batched = [model.predict([u])[0] for u in in_data] - assert_equal(Y_batched, Y_not_batched) # Test components with a model of type Model[List[Doc], Floats2d] @pytest.mark.parametrize("name", ["textcat"]) -def test_layers_batching_all_array(name): +def test_components_batching_array(name): nlp = English() in_data = [nlp(text) for text in texts] proc = nlp.create_pipe(name) - util_batch_unbatch_Array(proc.model, in_data, a1) + util_batch_unbatch_Array(proc.model, get_docs(), array) -def util_batch_unbatch_Array(model: Model[List[Doc], Array2d], in_data: List[Doc], out_data: Array2d): + +LAYERS = [ + (CharacterEmbed(nM=5, nC=3), get_docs(), list_floats), + (FeatureExtractor([100, 200]), get_docs(), list_ints), + (StaticVectors(), get_docs(), ragged), +] + + +@pytest.mark.parametrize("model,in_data,out_data", LAYERS) +def test_layers_batching_all(model, in_data, out_data): + # In = List[Doc] + if isinstance(in_data, list) and isinstance(in_data[0], Doc): + if isinstance(out_data, OPS.xp.ndarray) and out_data.ndim == 2: + util_batch_unbatch_Array(model, in_data, out_data) + elif ( + isinstance(out_data, list) + and isinstance(out_data[0], OPS.xp.ndarray) + and out_data[0].ndim == 2 + ): + util_batch_unbatch_List(model, in_data, out_data) + elif isinstance(out_data, Ragged): + util_batch_unbatch_Ragged(model, in_data, out_data) + + + +def util_batch_unbatch_List( + model: Model[List[Doc], List[Array2d]], in_data: List[Doc], out_data: List[Array2d] +): with data_validation(True): model.initialize(in_data, out_data) Y_batched = model.predict(in_data) Y_not_batched = [model.predict([u])[0] for u in in_data] - assert_equal(Y_batched, Y_not_batched) \ No newline at end of file + for i in range(len(Y_batched)): + assert_almost_equal(Y_batched[i], Y_not_batched[i], decimal=4) + + +def util_batch_unbatch_Array( + model: Model[List[Doc], Array2d], in_data: List[Doc], out_data: Array2d +): + with data_validation(True): + model.initialize(in_data, out_data) + Y_batched = model.predict(in_data).tolist() + Y_not_batched = [model.predict([u])[0] for u in in_data] + assert_almost_equal(Y_batched, Y_not_batched, decimal=4) + + +def util_batch_unbatch_Ragged( + model: Model[List[Doc], Ragged], in_data: List[Doc], out_data: Ragged +): + with data_validation(True): + model.initialize(in_data, out_data) + Y_batched = model.predict(in_data) + Y_not_batched = [] + for u in in_data: + Y_not_batched.extend(model.predict([u]).data.tolist()) + assert_almost_equal(Y_batched.data, Y_not_batched, decimal=4) From ff83bfae3f8bcf7c401af61bd04f4d8d0e6936a8 Mon Sep 17 00:00:00 2001 From: svlandeg Date: Tue, 13 Oct 2020 18:52:37 +0200 Subject: [PATCH 17/31] naming --- spacy/tests/pipeline/test_models.py | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/spacy/tests/pipeline/test_models.py b/spacy/tests/pipeline/test_models.py index 12de9d23e..b3982e714 100644 --- a/spacy/tests/pipeline/test_models.py +++ b/spacy/tests/pipeline/test_models.py @@ -39,7 +39,7 @@ def get_docs(): def test_components_batching_list(name): nlp = English() proc = nlp.create_pipe(name) - util_batch_unbatch_List(proc.model, get_docs(), list_floats) + util_batch_unbatch_docs_list(proc.model, get_docs(), list_floats) # Test components with a model of type Model[List[Doc], Floats2d] @@ -48,7 +48,7 @@ def test_components_batching_array(name): nlp = English() in_data = [nlp(text) for text in texts] proc = nlp.create_pipe(name) - util_batch_unbatch_Array(proc.model, get_docs(), array) + util_batch_unbatch_docs_array(proc.model, get_docs(), array) LAYERS = [ @@ -63,19 +63,19 @@ def test_layers_batching_all(model, in_data, out_data): # In = List[Doc] if isinstance(in_data, list) and isinstance(in_data[0], Doc): if isinstance(out_data, OPS.xp.ndarray) and out_data.ndim == 2: - util_batch_unbatch_Array(model, in_data, out_data) + util_batch_unbatch_docs_array(model, in_data, out_data) elif ( isinstance(out_data, list) and isinstance(out_data[0], OPS.xp.ndarray) and out_data[0].ndim == 2 ): - util_batch_unbatch_List(model, in_data, out_data) + util_batch_unbatch_docs_list(model, in_data, out_data) elif isinstance(out_data, Ragged): - util_batch_unbatch_Ragged(model, in_data, out_data) + util_batch_unbatch_docs_ragged(model, in_data, out_data) -def util_batch_unbatch_List( +def util_batch_unbatch_docs_list( model: Model[List[Doc], List[Array2d]], in_data: List[Doc], out_data: List[Array2d] ): with data_validation(True): @@ -86,7 +86,7 @@ def util_batch_unbatch_List( assert_almost_equal(Y_batched[i], Y_not_batched[i], decimal=4) -def util_batch_unbatch_Array( +def util_batch_unbatch_docs_array( model: Model[List[Doc], Array2d], in_data: List[Doc], out_data: Array2d ): with data_validation(True): @@ -96,7 +96,7 @@ def util_batch_unbatch_Array( assert_almost_equal(Y_batched, Y_not_batched, decimal=4) -def util_batch_unbatch_Ragged( +def util_batch_unbatch_docs_ragged( model: Model[List[Doc], Ragged], in_data: List[Doc], out_data: Ragged ): with data_validation(True): From ede979d42fa23e79f37af33dd725b03756af5447 Mon Sep 17 00:00:00 2001 From: svlandeg Date: Tue, 13 Oct 2020 18:53:17 +0200 Subject: [PATCH 18/31] formattting --- spacy/tests/pipeline/test_models.py | 1 - 1 file changed, 1 deletion(-) diff --git a/spacy/tests/pipeline/test_models.py b/spacy/tests/pipeline/test_models.py index b3982e714..0d1309cd8 100644 --- a/spacy/tests/pipeline/test_models.py +++ b/spacy/tests/pipeline/test_models.py @@ -74,7 +74,6 @@ def test_layers_batching_all(model, in_data, out_data): util_batch_unbatch_docs_ragged(model, in_data, out_data) - def util_batch_unbatch_docs_list( model: Model[List[Doc], List[Array2d]], in_data: List[Doc], out_data: List[Array2d] ): From e94a21638e27aba51ad38660c6136becb9b4466f Mon Sep 17 00:00:00 2001 From: svlandeg Date: Tue, 13 Oct 2020 21:07:13 +0200 Subject: [PATCH 19/31] adding tests for trained models to ensure predict reproducibility --- spacy/tests/parser/test_ner.py | 16 ++++++++++++++++ spacy/tests/parser/test_parse.py | 16 ++++++++++++++++ spacy/tests/pipeline/test_entity_linker.py | 15 +++++++++++++++ spacy/tests/pipeline/test_models.py | 1 - spacy/tests/pipeline/test_morphologizer.py | 15 +++++++++++++++ spacy/tests/pipeline/test_senter.py | 17 +++++++++++++++++ spacy/tests/pipeline/test_tagger.py | 16 ++++++++++++++++ spacy/tests/pipeline/test_textcat.py | 9 +++++++++ 8 files changed, 104 insertions(+), 1 deletion(-) diff --git a/spacy/tests/parser/test_ner.py b/spacy/tests/parser/test_ner.py index b657ae2e8..b4c22b48d 100644 --- a/spacy/tests/parser/test_ner.py +++ b/spacy/tests/parser/test_ner.py @@ -1,4 +1,7 @@ import pytest +from numpy.testing import assert_equal +from spacy.attrs import ENT_IOB + from spacy import util from spacy.lang.en import English from spacy.language import Language @@ -332,6 +335,19 @@ def test_overfitting_IO(): assert ents2[0].text == "London" assert ents2[0].label_ == "LOC" + # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions + texts = [ + "Just a sentence.", + "Then one more sentence about London.", + "Here is another one.", + "I like London.", + ] + batch_deps_1 = [doc.to_array([ENT_IOB]) for doc in nlp.pipe(texts)] + batch_deps_2 = [doc.to_array([ENT_IOB]) for doc in nlp.pipe(texts)] + no_batch_deps = [doc.to_array([ENT_IOB]) for doc in [nlp(text) for text in texts]] + assert_equal(batch_deps_1, batch_deps_2) + assert_equal(batch_deps_1, no_batch_deps) + def test_ner_warns_no_lookups(caplog): nlp = English() diff --git a/spacy/tests/parser/test_parse.py b/spacy/tests/parser/test_parse.py index ffb6f23f1..a914eb17a 100644 --- a/spacy/tests/parser/test_parse.py +++ b/spacy/tests/parser/test_parse.py @@ -1,4 +1,7 @@ import pytest +from numpy.testing import assert_equal +from spacy.attrs import DEP + from spacy.lang.en import English from spacy.training import Example from spacy.tokens import Doc @@ -210,3 +213,16 @@ def test_overfitting_IO(): assert doc2[0].dep_ == "nsubj" assert doc2[2].dep_ == "dobj" assert doc2[3].dep_ == "punct" + + # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions + texts = [ + "Just a sentence.", + "Then one more sentence about London.", + "Here is another one.", + "I like London.", + ] + batch_deps_1 = [doc.to_array([DEP]) for doc in nlp.pipe(texts)] + batch_deps_2 = [doc.to_array([DEP]) for doc in nlp.pipe(texts)] + no_batch_deps = [doc.to_array([DEP]) for doc in [nlp(text) for text in texts]] + assert_equal(batch_deps_1, batch_deps_2) + assert_equal(batch_deps_1, no_batch_deps) diff --git a/spacy/tests/pipeline/test_entity_linker.py b/spacy/tests/pipeline/test_entity_linker.py index f2e6defcb..8ba2d0d3e 100644 --- a/spacy/tests/pipeline/test_entity_linker.py +++ b/spacy/tests/pipeline/test_entity_linker.py @@ -1,5 +1,7 @@ from typing import Callable, Iterable import pytest +from numpy.testing import assert_equal +from spacy.attrs import ENT_KB_ID from spacy.kb import KnowledgeBase, get_candidates, Candidate from spacy.vocab import Vocab @@ -496,6 +498,19 @@ def test_overfitting_IO(): predictions.append(ent.kb_id_) assert predictions == GOLD_entities + # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions + texts = [ + "Russ Cochran captured his first major title with his son as caddie.", + "Russ Cochran his reprints include EC Comics.", + "Russ Cochran has been publishing comic art.", + "Russ Cochran was a member of University of Kentucky's golf team.", + ] + batch_deps_1 = [doc.to_array([ENT_KB_ID]) for doc in nlp.pipe(texts)] + batch_deps_2 = [doc.to_array([ENT_KB_ID]) for doc in nlp.pipe(texts)] + no_batch_deps = [doc.to_array([ENT_KB_ID]) for doc in [nlp(text) for text in texts]] + assert_equal(batch_deps_1, batch_deps_2) + assert_equal(batch_deps_1, no_batch_deps) + def test_kb_serialization(): # Test that the KB can be used in a pipeline with a different vocab diff --git a/spacy/tests/pipeline/test_models.py b/spacy/tests/pipeline/test_models.py index 0d1309cd8..1ab5f7ea5 100644 --- a/spacy/tests/pipeline/test_models.py +++ b/spacy/tests/pipeline/test_models.py @@ -46,7 +46,6 @@ def test_components_batching_list(name): @pytest.mark.parametrize("name", ["textcat"]) def test_components_batching_array(name): nlp = English() - in_data = [nlp(text) for text in texts] proc = nlp.create_pipe(name) util_batch_unbatch_docs_array(proc.model, get_docs(), array) diff --git a/spacy/tests/pipeline/test_morphologizer.py b/spacy/tests/pipeline/test_morphologizer.py index fd7aa05be..85d1d6c8b 100644 --- a/spacy/tests/pipeline/test_morphologizer.py +++ b/spacy/tests/pipeline/test_morphologizer.py @@ -1,4 +1,5 @@ import pytest +from numpy.testing import assert_equal from spacy import util from spacy.training import Example @@ -6,6 +7,7 @@ from spacy.lang.en import English from spacy.language import Language from spacy.tests.util import make_tempdir from spacy.morphology import Morphology +from spacy.attrs import MORPH def test_label_types(): @@ -101,3 +103,16 @@ def test_overfitting_IO(): doc2 = nlp2(test_text) assert [str(t.morph) for t in doc2] == gold_morphs assert [t.pos_ for t in doc2] == gold_pos_tags + + # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions + texts = [ + "Just a sentence.", + "Then one more sentence about London.", + "Here is another one.", + "I like London.", + ] + batch_deps_1 = [doc.to_array([MORPH]) for doc in nlp.pipe(texts)] + batch_deps_2 = [doc.to_array([MORPH]) for doc in nlp.pipe(texts)] + no_batch_deps = [doc.to_array([MORPH]) for doc in [nlp(text) for text in texts]] + assert_equal(batch_deps_1, batch_deps_2) + assert_equal(batch_deps_1, no_batch_deps) diff --git a/spacy/tests/pipeline/test_senter.py b/spacy/tests/pipeline/test_senter.py index c9722e5de..7a256f79b 100644 --- a/spacy/tests/pipeline/test_senter.py +++ b/spacy/tests/pipeline/test_senter.py @@ -1,4 +1,6 @@ import pytest +from numpy.testing import assert_equal +from spacy.attrs import SENT_START from spacy import util from spacy.training import Example @@ -80,3 +82,18 @@ def test_overfitting_IO(): nlp2 = util.load_model_from_path(tmp_dir) doc2 = nlp2(test_text) assert [int(t.is_sent_start) for t in doc2] == gold_sent_starts + + # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions + texts = [ + "Just a sentence.", + "Then one more sentence about London.", + "Here is another one.", + "I like London.", + ] + batch_deps_1 = [doc.to_array([SENT_START]) for doc in nlp.pipe(texts)] + batch_deps_2 = [doc.to_array([SENT_START]) for doc in nlp.pipe(texts)] + no_batch_deps = [ + doc.to_array([SENT_START]) for doc in [nlp(text) for text in texts] + ] + assert_equal(batch_deps_1, batch_deps_2) + assert_equal(batch_deps_1, no_batch_deps) diff --git a/spacy/tests/pipeline/test_tagger.py b/spacy/tests/pipeline/test_tagger.py index b9db76cdf..885bdbce1 100644 --- a/spacy/tests/pipeline/test_tagger.py +++ b/spacy/tests/pipeline/test_tagger.py @@ -1,4 +1,7 @@ import pytest +from numpy.testing import assert_equal +from spacy.attrs import TAG + from spacy import util from spacy.training import Example from spacy.lang.en import English @@ -117,6 +120,19 @@ def test_overfitting_IO(): assert doc2[2].tag_ is "J" assert doc2[3].tag_ is "N" + # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions + texts = [ + "Just a sentence.", + "I like green eggs.", + "Here is another one.", + "I eat ham.", + ] + batch_deps_1 = [doc.to_array([TAG]) for doc in nlp.pipe(texts)] + batch_deps_2 = [doc.to_array([TAG]) for doc in nlp.pipe(texts)] + no_batch_deps = [doc.to_array([TAG]) for doc in [nlp(text) for text in texts]] + assert_equal(batch_deps_1, batch_deps_2) + assert_equal(batch_deps_1, no_batch_deps) + def test_tagger_requires_labels(): nlp = English() diff --git a/spacy/tests/pipeline/test_textcat.py b/spacy/tests/pipeline/test_textcat.py index dd2f1070b..91348b1b3 100644 --- a/spacy/tests/pipeline/test_textcat.py +++ b/spacy/tests/pipeline/test_textcat.py @@ -1,6 +1,7 @@ import pytest import random import numpy.random +from numpy.testing import assert_equal from thinc.api import fix_random_seed from spacy import util from spacy.lang.en import English @@ -174,6 +175,14 @@ def test_overfitting_IO(): assert scores["cats_score"] == 1.0 assert "cats_score_desc" in scores + # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions + texts = ["Just a sentence.", "I like green eggs.", "I am happy.", "I eat ham."] + batch_deps_1 = [doc.cats for doc in nlp.pipe(texts)] + batch_deps_2 = [doc.cats for doc in nlp.pipe(texts)] + no_batch_deps = [doc.cats for doc in [nlp(text) for text in texts]] + assert_equal(batch_deps_1, batch_deps_2) + assert_equal(batch_deps_1, no_batch_deps) + # fmt: off @pytest.mark.parametrize( From 0796401c1955fc3508b2d1f50b402b492fa690b2 Mon Sep 17 00:00:00 2001 From: svlandeg Date: Wed, 14 Oct 2020 16:55:00 +0200 Subject: [PATCH 20/31] call NumpyOps instead of get_current_ops() --- spacy/tests/pipeline/test_models.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/spacy/tests/pipeline/test_models.py b/spacy/tests/pipeline/test_models.py index 1ab5f7ea5..d04ac9cd4 100644 --- a/spacy/tests/pipeline/test_models.py +++ b/spacy/tests/pipeline/test_models.py @@ -4,7 +4,7 @@ import numpy import pytest from numpy.testing import assert_almost_equal from spacy.vocab import Vocab -from thinc.api import get_current_ops, Model, data_validation +from thinc.api import NumpyOps, Model, data_validation from thinc.types import Array2d, Ragged from spacy.lang.en import English @@ -12,7 +12,8 @@ from spacy.ml import FeatureExtractor, StaticVectors from spacy.ml._character_embed import CharacterEmbed from spacy.tokens import Doc -OPS = get_current_ops() + +OPS = NumpyOps() texts = ["These are 4 words", "Here just three"] l0 = [[1, 2], [3, 4], [5, 6], [7, 8]] From abeafcbc083dc879376aefb9d84dd9b01ab4cc52 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 08:58:30 +0200 Subject: [PATCH 21/31] Update docs [ci skip] --- website/docs/usage/_benchmarks-models.md | 11 +++++---- website/docs/usage/facts-figures.md | 30 +++++++++++++++++++++--- 2 files changed, 33 insertions(+), 8 deletions(-) diff --git a/website/docs/usage/_benchmarks-models.md b/website/docs/usage/_benchmarks-models.md index becd313f4..4e6da9ad8 100644 --- a/website/docs/usage/_benchmarks-models.md +++ b/website/docs/usage/_benchmarks-models.md @@ -24,8 +24,7 @@ import { Help } from 'components/typography'; import Link from 'components/link' | Named Entity Recognition System | OntoNotes | CoNLL '03 | | ------------------------------------------------------------------------------ | --------: | --------: | | spaCy RoBERTa (2020) | 89.7 | 91.6 | -| spaCy CNN (2020) | 84.5 | | -| spaCy CNN (2017) | | | +| spaCy CNN (2020) | 84.5 | 87.4 | | [Stanza](https://stanfordnlp.github.io/stanza/) (StanfordNLP)1 | 88.8 | 92.1 | | Flair2 | 89.7 | 93.1 | | BERT Base3 | - | 92.4 | @@ -36,9 +35,11 @@ import { Help } from 'components/typography'; import Link from 'components/link' [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) and [CoNLL-2003](https://www.aclweb.org/anthology/W03-0419.pdf) corpora. See [NLP-progress](http://nlpprogress.com/english/named_entity_recognition.html) for -more results. **1. ** [Qi et al. (2020)](https://arxiv.org/pdf/2003.07082.pdf). -**2. ** [Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/). **3. -** [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805). +more results. Project template: +[`benchmarks/ner_conll03`](%%GITHUB_PROJECTS/benchmarks/ner_conll03). **1. ** +[Qi et al. (2020)](https://arxiv.org/pdf/2003.07082.pdf). **2. ** +[Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/). **3. ** +[Devlin et al. (2018)](https://arxiv.org/abs/1810.04805). diff --git a/website/docs/usage/facts-figures.md b/website/docs/usage/facts-figures.md index 2707f68fa..c7a7d0525 100644 --- a/website/docs/usage/facts-figures.md +++ b/website/docs/usage/facts-figures.md @@ -65,8 +65,8 @@ import Benchmarks from 'usage/\_benchmarks-models.md' | Dependency Parsing System | UAS | LAS | | ------------------------------------------------------------------------------ | ---: | ---: | -| spaCy RoBERTa (2020)1 | 95.5 | 94.3 | -| spaCy CNN (2020)1 | | | +| spaCy RoBERTa (2020) | 95.5 | 94.3 | +| spaCy CNN (2020) | | | | [Mrini et al.](https://khalilmrini.github.io/Label_Attention_Layer.pdf) (2019) | 97.4 | 96.3 | | [Zhou and Zhao](https://www.aclweb.org/anthology/P19-1230/) (2019) | 97.2 | 95.7 | @@ -74,13 +74,37 @@ import Benchmarks from 'usage/\_benchmarks-models.md' **Dependency parsing accuracy** on the Penn Treebank. See [NLP-progress](http://nlpprogress.com/english/dependency_parsing.html) for more -results. **1. ** Project template: +results. Project template: [`benchmarks/parsing_penn_treebank`](%%GITHUB_PROJECTS/benchmarks/parsing_penn_treebank). +### Speed comparison {#benchmarks-speed} + + + +
+ +| Library | Pipeline | WPS CPU words per second on CPU, higher is better | WPS GPU words per second on GPU, higher is better | +| ------- | ----------------------------------------------- | -------------------------------------------------------------: | -------------------------------------------------------------: | +| spaCy | [`en_core_web_md`](/models/en#en_core_web_md) | +| spaCy | [`en_core_web_trf`](/models/en#en_core_web_trf) | +| Stanza | `en_ewt` | | +| Flair | `pos-fast_ner-fast` | +| Flair | `pos_ner` | +| UDPipe | `english-ewt-ud-2.5` | + +
+ +**End-to-end processing speed** on raw unannotated text. Project template: +[`benchmarks/speed`](%%GITHUB_PROJECTS/benchmarks/speed). + +
+ +
+ From 5d62499266e2ba95e5d4b669c89ca1a6580ed798 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 09:29:15 +0200 Subject: [PATCH 22/31] Fix tests --- spacy/tests/conftest.py | 16 ++--------- spacy/tests/lang/hi/test_lex_attrs.py | 9 ++---- spacy/tests/regression/test_issue6177.py | 35 ------------------------ 3 files changed, 5 insertions(+), 55 deletions(-) delete mode 100644 spacy/tests/regression/test_issue6177.py diff --git a/spacy/tests/conftest.py b/spacy/tests/conftest.py index 2cbfa5ee2..2d34cf0d5 100644 --- a/spacy/tests/conftest.py +++ b/spacy/tests/conftest.py @@ -127,7 +127,7 @@ def he_tokenizer(): @pytest.fixture(scope="session") def hi_tokenizer(): - return get_lang_class("hi").Defaults.create_tokenizer() + return get_lang_class("hi")().tokenizer @pytest.fixture(scope="session") @@ -245,14 +245,6 @@ def tr_tokenizer(): return get_lang_class("tr")().tokenizer -@pytest.fixture(scope="session") -def tr_vocab(): - return get_lang_class("tr").Defaults.create_vocab() - -@pytest.fixture(scope="session") -def tr_vocab(): - return get_lang_class("tr").Defaults.create_vocab() - @pytest.fixture(scope="session") def tt_tokenizer(): return get_lang_class("tt")().tokenizer @@ -305,11 +297,7 @@ def zh_tokenizer_pkuseg(): "segmenter": "pkuseg", } }, - "initialize": { - "tokenizer": { - "pkuseg_model": "web", - } - }, + "initialize": {"tokenizer": {"pkuseg_model": "web",}}, } nlp = get_lang_class("zh").from_config(config) nlp.initialize() diff --git a/spacy/tests/lang/hi/test_lex_attrs.py b/spacy/tests/lang/hi/test_lex_attrs.py index e3cfffb89..187a23cb4 100644 --- a/spacy/tests/lang/hi/test_lex_attrs.py +++ b/spacy/tests/lang/hi/test_lex_attrs.py @@ -1,15 +1,12 @@ -# coding: utf-8 -from __future__ import unicode_literals - import pytest from spacy.lang.hi.lex_attrs import norm, like_num def test_hi_tokenizer_handles_long_text(hi_tokenizer): text = """ -ये कहानी 1900 के दशक की है। कौशल्या (स्मिता जयकर) को पता चलता है कि उसका -छोटा बेटा, देवदास (शाहरुख खान) वापस घर आ रहा है। देवदास 10 साल पहले कानून की -पढ़ाई करने के लिए इंग्लैंड गया था। उसके लौटने की खुशी में ये बात कौशल्या अपनी पड़ोस +ये कहानी 1900 के दशक की है। कौशल्या (स्मिता जयकर) को पता चलता है कि उसका +छोटा बेटा, देवदास (शाहरुख खान) वापस घर आ रहा है। देवदास 10 साल पहले कानून की +पढ़ाई करने के लिए इंग्लैंड गया था। उसके लौटने की खुशी में ये बात कौशल्या अपनी पड़ोस में रहने वाली सुमित्रा (किरण खेर) को भी बता देती है। इस खबर से वो भी खुश हो जाती है। """ tokens = hi_tokenizer(text) diff --git a/spacy/tests/regression/test_issue6177.py b/spacy/tests/regression/test_issue6177.py deleted file mode 100644 index c806011c3..000000000 --- a/spacy/tests/regression/test_issue6177.py +++ /dev/null @@ -1,35 +0,0 @@ -# coding: utf8 -from __future__ import unicode_literals - -from spacy.lang.en import English -from spacy.util import fix_random_seed - - -def test_issue6177(): - """Test that after fixing the random seed, the results of the pipeline are truly identical""" - - # NOTE: no need to transform this code to v3 when 'master' is merged into 'develop'. - # A similar test exists already for v3: test_issue5551 - # This is just a backport - - results = [] - for i in range(3): - fix_random_seed(0) - nlp = English() - example = ( - "Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.", - {"cats": {"Labe1": 1.0, "Label2": 0.0, "Label3": 0.0}}, - ) - textcat = nlp.create_pipe("textcat") - nlp.add_pipe(textcat) - for label in set(example[1]["cats"]): - textcat.add_label(label) - nlp.begin_training() - # Store the result of each iteration - result = textcat.model.predict([nlp.make_doc(example[0])]) - results.append(list(result[0])) - - # All results should be the same because of the fixed seed - assert len(results) == 3 - assert results[0] == results[1] - assert results[0] == results[2] \ No newline at end of file From 5665a21517a48245e2846b194425b9bf2399145c Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 09:30:32 +0200 Subject: [PATCH 23/31] Tidy up --- spacy/lang/tr/morph_rules.py | 3905 ----------------------------- spacy/lang/tr/syntax_iterators.py | 3 - 2 files changed, 3908 deletions(-) delete mode 100644 spacy/lang/tr/morph_rules.py diff --git a/spacy/lang/tr/morph_rules.py b/spacy/lang/tr/morph_rules.py deleted file mode 100644 index 02302c504..000000000 --- a/spacy/lang/tr/morph_rules.py +++ /dev/null @@ -1,3905 +0,0 @@ -# coding: utf8 -from __future__ import unicode_literals - -from ...symbols import LEMMA, PRON_LEMMA - -_adverbs = [ - "apansızın", - "aslen", - "aynen", - "ayrıyeten", - "basbayağı", - "başaşağı", - "belki", - "çatkapı", - "demin", - "derhal", - "doyasıya", - "düpedüz", - "ebediyen", - "elbet", - "elbette", - "enikonu", - "epey", - "epeyce", - "epeydir", - "esasen", - "evvela", - "galiba", - "gayet", - "genellikle", - "gerçekten", - "gerisingeri", - "giderayak", - "gitgide", - "gıyaben", - "gözgöze", - "güçbela", - "gündüzleyin", - "güya", - "habire", - "hakikaten", - "hakkaten", - "halen", - "halihazırda", - "harfiyen", - "haricen", - "hasbelkader", - "hemen", - "henüz", - "hep", - "hepten", - "herhalde", - "hiç", - "hükmen", - "ihtiyaten", - "illaki", -"ismen", - "iştiraken", - "izafeten", - "kalben", - "kargatulumba", - "kasten", - "katiyen", - "katiyyen", - "kazara", - "kefaleten", - "kendiliğinden", - "kerhen", - "kesinkes", - "kesinlikle", - "keşke", - "kimileyin", - "külliyen", - "layıkıyla", - "maalesef", - "mahsusçuktan", - "masumane", - "malulen", - "mealen", - "mecazen", - "mecburen", - "muhakkak", - "muhtemelen", - "mutlaka", - "müstacelen", - "müştereken", - "müteakiben", - "naçizane", - "nadiren", - "nakden", - "naklen", - "nazikane", - "nerdeyse", - "neredeyse", - "nispeten", - "nöbetleşe", - "olabildiğince", - "olduğunca", - "ortaklaşa", - "otomatikman", - "öğlenleyin", - "öğleyin", - "öldüresiye", - "ölesiye", - "örfen", - "öyle", - "öylesine", - "özellikle", - "peşinen", - "peşpeşe", - "peyderpey", - "ruhen", - "sadece", - "sahi", - "sahiden", - "salt", - "salimen", - "sanırım", - "sanki", - "sehven", - "senlibenli", - "sereserpe", - "sırf", - "sözgelimi", - "sözgelişi", - "şahsen", - "şakacıktan", - "şeklen", - "şıppadak", - "şimdilik", - "şipşak", - "tahminen", - "takdiren", - "takiben", - "tamamen", - "tamamiyle", - "tedbiren", - "temsilen", - "tepetaklak", - "tercihen", - "tesadüfen", - "tevekkeli", - "tezelden", - "tıbben", - "tıkabasa", - "tıpatıp", - "toptan", - "tümüyle", - "uluorta", - "usulcacık", - "usulen", - "üstünkörü", - "vekaleten", - "vicdanen", - "yalancıktan", - "yavaşçacık", - "yekten", - "yeniden", - "yeterince", - "yine", - "yüzükoyun", - "yüzüstü", - "yüzyüze", - "zaten", - "zımmen", - "zihnen", - "zilzurna" - ] - -_postpositions = [ - "geçe", - "gibi", - "göre", - "ilişkin", - "kadar", - "kala", - "karşın", - "nazaran" - "rağmen", - "üzere" - ] - -_subordinating_conjunctions = [ - "eğer", - "madem", - "mademki", - "şayet" - ] - -_coordinating_conjunctions = [ - "ama", - "hem", - "fakat", - "ila", - "lakin", - "ve", - "veya", - "veyahut" - ] - -MORPH_RULES = { - "ADP": {word: {"POS": "ADP"} for word in _postpositions}, - "ADV": {word: {"POS": "ADV"} for word in _adverbs}, - "SCONJ": {word: {"POS": "SCONJ"} for word in _subordinating_conjunctions}, - "CCONJ": {word: {"POS": "CCONJ"} for word in _coordinating_conjunctions}, - "PRON": { - "bana": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Dat" - }, - "benden": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Abl" - }, - "bende": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Loc" - }, - "beni": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Acc" - }, - "benle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Ins" - }, - "ben": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Nom" - }, - "benim": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Gen" - }, - "benimle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Sing", - "Case": "Ins" - }, - "sana": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Dat" - }, - "senden": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Abl" - }, - "sende": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Loc" - }, - "seni": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Acc" - }, - "senle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Ins" - }, - "sen": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Nom" - }, - "senin": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Gen" - }, - "seninle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Sing", - "Case": "Ins" - }, - "ona": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Sing", - "Case": "Dat" - }, - "ondan": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Sing", - "Case": "Abl" - }, - "onda": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "", - "Number": "", - "Case": "Loc" - }, - "onu": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Sing", - "Case": "Acc" - }, - "onla": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Sing", - "Case": "Ins" - }, - "o": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Sing", - "Case": "Nom" - }, - "onun": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Sing", - "Case": "Gen" - }, - "onunla": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Sing", - "Case": "Ins" - }, - "bize": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Dat" - }, - "bizden": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Abl" - }, - "bizde": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Loc" - }, - "bizi": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Acc" - }, - "bizle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Ins" - }, - "biz": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Nom" - }, - "bizim": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Gen" - }, - "bizimle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "One", - "Number": "Plur", - "Case": "Ins" - }, - "size": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Dat" - }, - "sizden": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Abl" - }, - "sizde": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Loc" - }, - "sizi": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Acc" - }, - "sizle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Ins" - }, - "siz": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Nom" - }, - "sizin": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Gen" - }, - "sizinle": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "PronType": "Prs", - "Person": "Two", - "Number": "Plur", - "Case": "Ins" - }, - "onlara": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Plur", - "Case": "Dat" - }, - "onlardan": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Plur", - "Case": "Abl" - }, - "onlarda": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Plur", - "Case": "Loc" - }, - "onları": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Plur", - "Case": "Acc" - }, - "onlarla": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Plur", - "Case": "Ins" - }, - "onlar": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Plur", - "Case": "Nom" - }, - "onların": { - "LEMMA": "PRON_LEMMA", - "POS": "PRON", - "Person": "Three", - "Number": "Plur", - "Case": "Gen" - }, - "buna": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Dat" - }, - "bundan": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Abl" - }, - "bunda": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Loc" - }, - "bunu": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Acc" - }, - "bunla": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Ins" - }, - "bu": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Nom" - }, - "bunun": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Gen" - }, - "bununla": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Ins" - }, - "şuna": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Dat" - }, - "şundan": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Abl" - }, - "şunda": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Loc" - }, - "şunu": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Acc" - }, - "şunla": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Ins" - }, - "şu": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Nom" - }, - "şunun": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Gen" - }, - "şununla": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Ins" - }, - "bunlara": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Dat" - }, - "bunlardan": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Abl" - }, - "bunlarda": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Loc" - }, - "bunları": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Acc" - }, - "bunlarla": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Ins" - }, - "bunlar": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Nom" - }, - "bunların": { - "LEMMA": "bu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Gen" - }, - "şunlara": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Dat" - }, - "şunlardan": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Abl" - }, - "şunlarda": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Loc" - }, - "şunları": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Acc" - }, - "şunlarla": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Ins" - }, - "şunlar": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Nom" - }, - "şunların": { - "LEMMA": "şu", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Gen" - }, - "buraya": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Dat" - }, - "buradan": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Abl" - }, - "burada": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "loc.sg" - }, - "burayı": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Acc" - }, - "burayla": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Ins" - }, - "bura": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Nom" - }, - "buranın": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Gen" - }, - "şuraya": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Dat" - }, - "şuradan": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Abl" - }, - "şurada": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "loc.sg" - }, - "şurayı": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Acc" - }, - "şurayla": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Ins" - }, - "şura": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Nom" - }, - "şuranın": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Gen" - }, - "oraya": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Dat" - }, - "oradan": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Abl" - }, - "orada": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "loc.sg" - }, - "orayı": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Acc" - }, - "orayla": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Ins" - }, - "ora": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Nom" - }, - "oranın": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Sing", - "Case": "Gen" - }, - "buralarına": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Dat" - }, - "buralarından": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Abl" - }, - "buralarında": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Loc" - }, - "buralarını": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Acc" - }, - "buralarıyla": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Ins" - }, - "buraları": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Nom" - }, - "buralarının": { - "LEMMA": "bura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Gen" - }, - "şuralarına": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Dat" - }, - "şuralarından": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Abl" - }, - "şuralarında": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Loc" - }, - "şuralarını": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Acc" - }, - "şuralarıyla": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Ins" - }, - "şuraları": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Nom" - }, - "şuralarının": { - "LEMMA": "şura", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Gen" - }, - "oralarına": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Dat" - }, - "oralarından": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Abl" - }, - "oralarında": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Loc" - }, - "oralarını": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Acc" - }, - "oralarıyla": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Ins" - }, - "oraları": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Nom" - }, - "oralarının": { - "LEMMA": "ora", - "POS": "PRON", - "PronType": "Dem", - "Number": "Plur", - "Case": "Gen" - }, - "kendime": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Dat", - "Number": "Sing" - }, - "kendimden": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Abl", - "Number": "Sing" - }, - "kendimde": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Loc", - "Number": "Sing" - }, - "kendimi": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Acc", - "Number": "Sing" - }, - "kendimle": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Ins", - "Number": "Sing" - }, - "kendim": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Nom", - "Number": "Sing" - }, - "kendimin": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Gen", - "Number": "Sing" - }, - "kendine": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Dat", - "Number": "Sing" - }, - "kendinden": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Abl", - "Number": "Sing" - }, - "kendinde": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Loc", - "Number": "Sing" - }, - "kendini": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Acc", - "Number": "Sing" - }, - "kendiyle": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Ins", - "Number": "Sing" - }, - "kendi": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Nom", - "Number": "Sing" - }, - "kendinin": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Gen", - "Number": "Sing" - }, - "kendisine": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Dat", - "Number": "Sing" - }, - "kendisinden": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Abl", - "Number": "Sing" - }, - "kendisinde": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Loc", - "Number": "Sing" - }, - "kendisini": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Acc", - "Number": "Sing" - }, - "kendisiyle": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Ins", - "Number": "Sing" - }, - "kendisi": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Nom", - "Number": "Sing" - }, - "kendisinin": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Gen", - "Number": "Sing" - }, - "kendimize": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Dat", - "Number": "Sing" - }, - "kendimizden": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Abl", - "Number": "Sing" - }, - "kendimizde": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Loc", - "Number": "Sing" - }, - "kendimizi": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Acc", - "Number": "Sing" - }, - "kendimizle": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Ins", - "Number": "Sing" - }, - "kendimiz": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Nom", - "Number": "Sing" - }, - "kendimizin": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "One", - "Case": "Gen", - "Number": "Sing" - }, - "kendinize": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Dat", - "Number": "Sing" - }, - "kendinizden": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Abl", - "Number": "Sing" - }, - "kendinizde": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Loc", - "Number": "Sing" - }, - "kendinizi": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Acc", - "Number": "Sing" - }, - "kendinizle": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Ins", - "Number": "Sing" - }, - "kendiniz": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Nom", - "Number": "Sing" - }, - "kendinizin": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Two", - "Case": "Gen", - "Number": "Sing" - }, - "kendilerine": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Dat", - "Number": "Sing" - }, - "kendilerinden": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Abl", - "Number": "Sing" - }, - "kendilerinde": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Loc", - "Number": "Sing" - }, - "kendilerini": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Acc", - "Number": "Sing" - }, - "kendileriyle": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Ins", - "Number": "Sing" - }, - "kendileriyken": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Nom", - "Number": "Sing" - }, - "kendilerinin": { - "LEMMA": "kendi", - "POS": "PRON", - "PronType": "Prs", - "Reflex": "Yes", - "Person": "Three", - "Case": "Gen", - "Number": "Sing" - }, - "hangilerine": { - "LEMMA": "hangileri", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Sing" - }, - "hangilerinden": { - "LEMMA": "hangileri", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Sing" - }, - "hangilerinde": { - "LEMMA": "hangileri", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Sing" - }, - "hangilerini": { - "LEMMA": "hangileri", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Sing" - }, - "hangileriyle": { - "LEMMA": "hangileri", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Sing" - }, - "hangileri": { - "LEMMA": "hangileri", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Sing" - }, - "hangilerinin": { - "LEMMA": "hangileri", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Sing" - }, - "hangisine": { - "LEMMA": "hangi", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Sing" - }, - "hangisinden": { - "LEMMA": "hangi", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Sing" - }, - "hangisinde": { - "LEMMA": "hangi", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Sing" - }, - "hangisini": { - "LEMMA": "hangi", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Sing" - }, - "hangisiyle": { - "LEMMA": "hangi", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Sing" - }, - "hangisi": { - "LEMMA": "hangi", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Sing" - }, - "hangisinin": { - "LEMMA": "hangi", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Sing" - }, - "kime": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Sing" - }, - "kimden": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Sing" - }, - "kimde": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Sing" - }, - "kimi": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Sing" - }, - "kimle": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Sing" - }, - "kim": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Sing" - }, - "kimin": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Sing" - }, - "kimlere": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Plur" - }, - "kimlerden": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Plur" - }, - "kimlerde": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Plur" - }, - "kimleri": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Plur" - }, - "kimlerle": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Plur" - }, - "kimler": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Plur" - }, - "kimlerin": { - "LEMMA": "kim", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Plur" - }, - "neye": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Sing" - }, - "neden": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Sing" - }, - "nede": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Sing" - }, - "neyi": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Sing" - }, - "neyle": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Sing" - }, - "ne": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Sing" - }, - "neyin": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Sing" - }, - "nelere": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Plur" - }, - "nelerden": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Plur" - }, - "nelerde": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Plur" - }, - "neleri": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Plur" - }, - "nelerle": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Plur" - }, - "neler": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Plur" - }, - "nelerin": { - "LEMMA": "ne", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Plur" - }, - "nereye": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Sing" - }, - "nereden": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Sing" - }, - "nerede": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Sing" - }, - "nereyi": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Sing" - }, - "nereyle": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Sing" - }, - "nere": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Sing" - }, - "nerenin": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Sing" - }, - "nerelere": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Plur" - }, - "nerelerden": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Plur" - }, - "nerelerde": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Plur" - }, - "nereleri": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Plur" - }, - "nerelerle": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Plur" - }, - "nereler": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Plur" - }, - "nerelerin": { - "LEMMA": "nere", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Plur" - }, - "kaçlarına": { - "LEMMA": "kaçları", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Sing" - }, - "kaçlarından": { - "LEMMA": "kaçları", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Sing" - }, - "kaçlarında": { - "LEMMA": "kaçları", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Sing" - }, - "kaçlarını": { - "LEMMA": "kaçları", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Sing" - }, - "kaçlarıyla": { - "LEMMA": "kaçları", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Sing" - }, - "kaçları": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Sing" - }, - "kaçlarının": { - "LEMMA": "kaçları", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Sing" - }, - "kaçına": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Dat", - "Number": "Sing" - }, - "kaçından": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Abl", - "Number": "Sing" - }, - "kaçında": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Loc", - "Number": "Sing" - }, - "kaçını": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Acc", - "Number": "Sing" - }, - "kaçıyla": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Ins", - "Number": "Sing" - }, - "kaçı": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Nom", - "Number": "Sing" - }, - "kaçının": { - "LEMMA": "kaçı", - "POS": "PRON", - "PronType": "Int", - "Case": "Gen", - "Number": "Sing" - }, - "başkasına": { - "LEMMA": "başkası", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "başkasından": { - "LEMMA": "başkası", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "başkasında": { - "LEMMA": "başkası", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "başkasını": { - "LEMMA": "başkası", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "başkasıyla": { - "LEMMA": "başkası", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "başkası": { - "LEMMA": "başkası", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "başkasının": { - "LEMMA": "başkası", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "başkalarına": { - "LEMMA": "başkaları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "başkalarından": { - "LEMMA": "başkaları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "başkalarında": { - "LEMMA": "başkaları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "başkalarını": { - "LEMMA": "başkaları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "başkalarıyla": { - "LEMMA": "başkaları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "başkaları": { - "LEMMA": "başkaları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "başkalarının": { - "LEMMA": "başkaları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "bazısına": { - "LEMMA": "bazısı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "bazısından": { - "LEMMA": "bazısı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "bazısında": { - "LEMMA": "bazısı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "bazısını": { - "LEMMA": "bazısı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "bazısıyla": { - "LEMMA": "bazısı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "bazısı": { - "LEMMA": "bazısı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "bazısının": { - "LEMMA": "bazısı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "bazılarına": { - "LEMMA": "bazıları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "bazılarından": { - "LEMMA": "bazıları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "bazılarında": { - "LEMMA": "bazıları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "bazılarını": { - "LEMMA": "bazıları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "bazılarıyla": { - "LEMMA": "bazıları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "bazıları": { - "LEMMA": "bazıları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "bazılarının": { - "LEMMA": "bazıları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "birbirine": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Dat", - "Number": "Sing" - }, - "birbirinden": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Abl", - "Number": "Sing" - }, - "birbirinde": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Loc", - "Number": "Sing" - }, - "birbirini": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Acc", - "Number": "Sing" - }, - "birbiriyle": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Ins", - "Number": "Sing" - }, - "birbiri": { - "LEMMA": "birbiri", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Nom", - "Number": "Sing" - }, - "birbirinin": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Gen", - "Number": "Sing" - }, - "birbirlerine": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Dat", - "Number": "Sing" - }, - "birbirlerinden": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Abl", - "Number": "Sing" - }, - "birbirlerinde": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Loc", - "Number": "Sing" - }, - "birbirlerini": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Acc", - "Number": "Sing" - }, - "birbirleriyle": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Ins", - "Number": "Sing" - }, - "birbirleri": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Nom", - "Number": "Sing" - }, - "birbirlerinin": { - "LEMMA": "birbir", - "POS": "PRON", - "PronType": "Rcp", - "Case": "Gen", - "Number": "Sing" - }, - "birçoğuna": { - "LEMMA": "birçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "birçoğundan": { - "LEMMA": "birçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "birçoğunda": { - "LEMMA": "birçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "birçoğunu": { - "LEMMA": "birçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "birçoğuyla": { - "LEMMA": "birçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "birçoğu": { - "LEMMA": "birçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "birçoğunun": { - "LEMMA": "birçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "birçoklarına": { - "LEMMA": "birçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "birçoklarından": { - "LEMMA": "birçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "birçoklarında": { - "LEMMA": "birçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "birçoklarını": { - "LEMMA": "birçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "birçoklarıyla": { - "LEMMA": "birçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "birçokları": { - "LEMMA": "birçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "birçoklarının": { - "LEMMA": "birçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "birilerine": { - "LEMMA": "birileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "birilerinden": { - "LEMMA": "birileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "birilerinde": { - "LEMMA": "birileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "birilerini": { - "LEMMA": "birileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "birileriyle": { - "LEMMA": "birileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "birileri": { - "LEMMA": "birileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "birilerinin": { - "LEMMA": "birileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "birisine": { - "LEMMA": "biri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "birisinden": { - "LEMMA": "biri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "birisinde": { - "LEMMA": "biri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "birisini": { - "LEMMA": "biri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "birisiyle": { - "LEMMA": "biri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "birisi": { - "LEMMA": "biri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "birisinin": { - "LEMMA": "biri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "birkaçına": { - "LEMMA": "birkaçı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "birkaçından": { - "LEMMA": "birkaçı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "birkaçında": { - "LEMMA": "birkaçı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "birkaçını": { - "LEMMA": "birkaçı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "birkaçıyla": { - "LEMMA": "birkaçı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "birkaçı": { - "LEMMA": "birkaç", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "birkaçının": { - "LEMMA": "birkaçı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "birtakımına": { - "LEMMA": "birtakımı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "birtakımından": { - "LEMMA": "birtakımı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "birtakımında": { - "LEMMA": "birtakımı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "birtakımını": { - "LEMMA": "birtakımı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "birtakımıyla": { - "LEMMA": "birtakımı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "birtakımı": { - "LEMMA": "birtakımı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "birtakımının": { - "LEMMA": "birtakımı", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "böylesine": { - "LEMMA": "böylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "böylesinden": { - "LEMMA": "böylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "böylesinde": { - "LEMMA": "böylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "böylesini": { - "LEMMA": "böylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "böylesiyle": { - "LEMMA": "böylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "böylesi": { - "LEMMA": "böylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "böylesinin": { - "LEMMA": "böylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "şöylesine": { - "LEMMA": "şöylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "şöylesinden": { - "LEMMA": "şöylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "şöylesinde": { - "LEMMA": "şöylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "şöylesini": { - "LEMMA": "şöylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "şöylesiyle": { - "LEMMA": "şöylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "şöylesi": { - "LEMMA": "şöylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "şöylesinin": { - "LEMMA": "şöylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "öylesine": { - "LEMMA": "öylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "öylesinden": { - "LEMMA": "öylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "öylesinde": { - "LEMMA": "öylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "öylesini": { - "LEMMA": "öylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "öylesiyle": { - "LEMMA": "öylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "öylesi": { - "LEMMA": "öylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "öylesinin": { - "LEMMA": "öylesi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "böylelerine": { - "LEMMA": "böyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "böylelerinden": { - "LEMMA": "böyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "böylelerinde": { - "LEMMA": "böyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "böylelerini": { - "LEMMA": "böyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "böyleleriyle": { - "LEMMA": "böyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "böyleleri": { - "LEMMA": "böyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "böylelerinin": { - "LEMMA": "böyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "şöylelerine": { - "LEMMA": "şöyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "şöylelerinden": { - "LEMMA": "şöyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "şöylelerinde": { - "LEMMA": "şöyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "şöylelerini": { - "LEMMA": "şöyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "şöyleleriyle": { - "LEMMA": "şöyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "şöyleleri": { - "LEMMA": "şöyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "şöylelerinin": { - "LEMMA": "şöyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "öylelerine": { - "LEMMA": "öyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "öylelerinden": { - "LEMMA": "öyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "öylelerinde": { - "LEMMA": "öyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "öylelerini": { - "LEMMA": "öyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "öyleleriyle": { - "LEMMA": "öyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "öyleleri": { - "LEMMA": "öyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "öylelerinin": { - "LEMMA": "öyleleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "çoklarına": { - "LEMMA": "çokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "çoklarından": { - "LEMMA": "çokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "çoklarında": { - "LEMMA": "çokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "çoklarını": { - "LEMMA": "çokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "çoklarıyla": { - "LEMMA": "çokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "çokları": { - "LEMMA": "çokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "çoklarının": { - "LEMMA": "çokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "çoğuna": { - "LEMMA": "çoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "çoğundan": { - "LEMMA": "çoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "çoğunda": { - "LEMMA": "çoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "çoğunu": { - "LEMMA": "çoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "çoğuyla": { - "LEMMA": "çoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "çoğu": { - "LEMMA": "çoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "çoğunun": { - "LEMMA": "çoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "diğerine": { - "LEMMA": "diğeri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "diğerinden": { - "LEMMA": "diğeri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "diğerinde": { - "LEMMA": "diğeri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "diğerini": { - "LEMMA": "diğeri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "diğeriyle": { - "LEMMA": "diğeri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "diğeri": { - "LEMMA": "diğer", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "diğerinin": { - "LEMMA": "diğeri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "diğerlerine": { - "LEMMA": "diğerleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "diğerlerinden": { - "LEMMA": "diğerleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "diğerlerinde": { - "LEMMA": "diğerleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "diğerlerini": { - "LEMMA": "diğerleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "diğerleriyle": { - "LEMMA": "diğerleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "diğerleri": { - "LEMMA": "diğerleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "diğerlerinin": { - "LEMMA": "diğerleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "hepinize": { - "LEMMA": "hepiniz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "hepinizden": { - "LEMMA": "hepiniz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "hepinizde": { - "LEMMA": "hepiniz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "hepinizi": { - "LEMMA": "hepiniz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "hepinizle": { - "LEMMA": "hepiniz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "hepiniz": { - "LEMMA": "hepiniz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "hepinizin": { - "LEMMA": "hepiniz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "hepimize": { - "LEMMA": "hepimiz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "hepimizden": { - "LEMMA": "hepimiz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "hepimizde": { - "LEMMA": "hepimiz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "hepimizi": { - "LEMMA": "hepimiz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "hepimizle": { - "LEMMA": "hepimiz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "hepimiz": { - "LEMMA": "hepimiz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "hepimizin": { - "LEMMA": "hepimiz", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "hepsine": { - "LEMMA": "hepsi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "hepsinden": { - "LEMMA": "hepsi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "hepsinde": { - "LEMMA": "hepsi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "hepsini": { - "LEMMA": "hepsi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "hepsiyle": { - "LEMMA": "hepsi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "hepsi": { - "LEMMA": "hepsi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "hepsinin": { - "LEMMA": "hepsi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "herbirine": { - "LEMMA": "herbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "herbirinden": { - "LEMMA": "herbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "herbirinde": { - "LEMMA": "herbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "herbirini": { - "LEMMA": "herbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "herbiriyle": { - "LEMMA": "herbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "herbiri": { - "LEMMA": "herbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "herbirinin": { - "LEMMA": "herbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "herbirlerine": { - "LEMMA": "herbirleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "herbirlerinden": { - "LEMMA": "herbirleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "herbirlerinde": { - "LEMMA": "herbirleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "herbirlerini": { - "LEMMA": "herbirleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "herbirleriyle": { - "LEMMA": "herbirleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "herbirleri": { - "LEMMA": "herbirleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "herbirlerinin": { - "LEMMA": "herbirleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "herhangisine": { - "LEMMA": "herhangisi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "herhangisinden": { - "LEMMA": "herhangisi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "herhangisinde": { - "LEMMA": "herhangisi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "herhangisini": { - "LEMMA": "herhangisi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "herhangisiyle": { - "LEMMA": "herhangisi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "herhangisi": { - "LEMMA": "herhangisi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "herhangisinin": { - "LEMMA": "herhangisi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "herhangilerine": { - "LEMMA": "herhangileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "herhangilerinden": { - "LEMMA": "herhangileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "herhangilerinde": { - "LEMMA": "herhangileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "herhangilerini": { - "LEMMA": "herhangileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "herhangileriyle": { - "LEMMA": "herhangileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "herhangileri": { - "LEMMA": "herhangileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "herhangilerinin": { - "LEMMA": "herhangileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "herkese": { - "LEMMA": "herkes", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "herkesten": { - "LEMMA": "herkes", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "herkeste": { - "LEMMA": "herkes", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "herkesi": { - "LEMMA": "herkes", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "herkesle": { - "LEMMA": "herkes", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "herkes": { - "LEMMA": "herkes", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "herkesin": { - "LEMMA": "herkes", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "hiçbirisine": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "hiçbirisinden": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "hiçbirisinde": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "hiçbirisini": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "hiçbirisiyle": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "hiçbirisi": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "hiçbirisinin": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "hiçbirine": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "hiçbirinden": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "hiçbirinde": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "hiçbirini": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "hiçbiriyle": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "hiçbiri": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "hiçbirinin": { - "LEMMA": "hiçbiri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "kimisine": { - "LEMMA": "kimi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "kimisinden": { - "LEMMA": "kimi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "kimisinde": { - "LEMMA": "kimi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "kimisini": { - "LEMMA": "kimi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "kimisiyle": { - "LEMMA": "kimi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "kimisi": { - "LEMMA": "kimi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "kimisinin": { - "LEMMA": "kimi", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "kimilerine": { - "LEMMA": "kimileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "kimilerinden": { - "LEMMA": "kimileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "kimilerinde": { - "LEMMA": "kimileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "kimilerini": { - "LEMMA": "kimileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "kimileriyle": { - "LEMMA": "kimileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "kimileri": { - "LEMMA": "kimileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "kimilerinin": { - "LEMMA": "kimileri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "kimseye": { - "LEMMA": "kimse", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "kimseden": { - "LEMMA": "kimse", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "kimsede": { - "LEMMA": "kimse", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "kimseyi": { - "LEMMA": "kimse", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "kimseyle": { - "LEMMA": "kimse", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "kimse": { - "LEMMA": "kimse", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "kimsenin": { - "LEMMA": "kimse", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "öbürüne": { - "LEMMA": "öbürü", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "öbüründen": { - "LEMMA": "öbürü", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "öbüründe": { - "LEMMA": "öbürü", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "öbürünü": { - "LEMMA": "öbürü", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "öbürüyle": { - "LEMMA": "öbürü", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "öbürü": { - "LEMMA": "öbürü", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "öbürünün": { - "LEMMA": "öbürü", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "öbürlerine": { - "LEMMA": "öbürleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "öbürlerinden": { - "LEMMA": "öbürleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "öbürlerinde": { - "LEMMA": "öbürleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "öbürlerini": { - "LEMMA": "öbürleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "öbürleriyle": { - "LEMMA": "öbürleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "öbürleri": { - "LEMMA": "öbürleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "öbürlerinin": { - "LEMMA": "öbürleri", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "ötekisine": { - "LEMMA": "öteki", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "ötekisinden": { - "LEMMA": "öteki", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "ötekisinde": { - "LEMMA": "öteki", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "ötekisini": { - "LEMMA": "öteki", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "ötekisiyle": { - "LEMMA": "öteki", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "ötekisi": { - "LEMMA": "öteki", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "ötekisinin": { - "LEMMA": "öteki", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "pekçoğuna": { - "LEMMA": "pekçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "pekçoğundan": { - "LEMMA": "pekçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "pekçoğunda": { - "LEMMA": "pekçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "pekçoğunu": { - "LEMMA": "pekçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "pekçoğuyla": { - "LEMMA": "pekçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "pekçoğu": { - "LEMMA": "pekçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "pekçoğunun": { - "LEMMA": "pekçoğu", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - }, - "pekçoklarına": { - "LEMMA": "pekçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Dat", - "Number": "Sing" - }, - "pekçoklarından": { - "LEMMA": "pekçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Abl", - "Number": "Sing" - }, - "pekçoklarında": { - "LEMMA": "pekçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Loc", - "Number": "Sing" - }, - "pekçoklarını": { - "LEMMA": "pekçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Acc", - "Number": "Sing" - }, - "pekçoklarıyla": { - "LEMMA": "pekçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Ins", - "Number": "Sing" - }, - "pekçokları": { - "LEMMA": "pekçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Nom", - "Number": "Sing" - }, - "pekçoklarının": { - "LEMMA": "pekçokları", - "POS": "PRON", - "PronType": "Ind", - "Case": "Gen", - "Number": "Sing" - } - } - } - -for tag, rules in MORPH_RULES.items(): - for key, attrs in dict(rules).items(): - rules[key.title()] = attrs diff --git a/spacy/lang/tr/syntax_iterators.py b/spacy/lang/tr/syntax_iterators.py index d9b342949..3fd726fb5 100644 --- a/spacy/lang/tr/syntax_iterators.py +++ b/spacy/lang/tr/syntax_iterators.py @@ -1,6 +1,3 @@ -# coding: utf8 -from __future__ import unicode_literals - from ...symbols import NOUN, PROPN, PRON from ...errors import Errors From a93d42861d683081793f43a40eb42122cc247cca Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Thu, 15 Oct 2020 09:44:21 +0200 Subject: [PATCH 24/31] Use null raw for has_unknown_spaces in docs_to_json --- spacy/training/gold_io.pyx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/spacy/training/gold_io.pyx b/spacy/training/gold_io.pyx index 8fb6b8565..327748d01 100644 --- a/spacy/training/gold_io.pyx +++ b/spacy/training/gold_io.pyx @@ -20,7 +20,8 @@ def docs_to_json(docs, doc_id=0, ner_missing_tag="O"): docs = [docs] json_doc = {"id": doc_id, "paragraphs": []} for i, doc in enumerate(docs): - json_para = {'raw': doc.text, "sentences": [], "cats": [], "entities": [], "links": []} + raw = None if doc.has_unknown_spaces else doc.text + json_para = {'raw': raw, "sentences": [], "cats": [], "entities": [], "links": []} for cat, val in doc.cats.items(): json_cat = {"label": cat, "value": val} json_para["cats"].append(json_cat) From d165af26be7813b7982a071d387859503d7decbb Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 10:08:53 +0200 Subject: [PATCH 25/31] Auto-format [ci skip] --- spacy/lang/hi/lex_attrs.py | 7 ++++--- spacy/tests/conftest.py | 2 +- spacy/tests/lang/hi/test_lex_attrs.py | 6 ++++-- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/spacy/lang/hi/lex_attrs.py b/spacy/lang/hi/lex_attrs.py index 6ae9812d6..a18c2e513 100644 --- a/spacy/lang/hi/lex_attrs.py +++ b/spacy/lang/hi/lex_attrs.py @@ -74,7 +74,7 @@ _eleven_to_beyond = [ "बावन", "तिरपन", "तिरेपन", "चौवन", "चउवन", - "पचपन", + "पचपन", "छप्पन", "सतावन", "सत्तावन", "अठावन", @@ -91,7 +91,7 @@ _eleven_to_beyond = [ "उनहत्तर", "सत्तर", "इकहत्तर" - "बहत्तर", + "बहत्तर", "तिहत्तर", "चौहत्तर", "पचहत्तर", @@ -144,6 +144,7 @@ _ordinal_words_one_to_ten = [ _ordinal_suffix = "वाँ" # fmt: on + def norm(string): # normalise base exceptions, e.g. punctuation or currency symbols if string in BASE_NORMS: @@ -180,7 +181,7 @@ def like_num(text): if text in _ordinal_words_one_to_ten: return True if text.endswith(_ordinal_suffix): - if text[:-len(_ordinal_suffix)] in _eleven_to_beyond: + if text[: -len(_ordinal_suffix)] in _eleven_to_beyond: return True return False diff --git a/spacy/tests/conftest.py b/spacy/tests/conftest.py index 2d34cf0d5..3733d345d 100644 --- a/spacy/tests/conftest.py +++ b/spacy/tests/conftest.py @@ -297,7 +297,7 @@ def zh_tokenizer_pkuseg(): "segmenter": "pkuseg", } }, - "initialize": {"tokenizer": {"pkuseg_model": "web",}}, + "initialize": {"tokenizer": {"pkuseg_model": "web"}}, } nlp = get_lang_class("zh").from_config(config) nlp.initialize() diff --git a/spacy/tests/lang/hi/test_lex_attrs.py b/spacy/tests/lang/hi/test_lex_attrs.py index 187a23cb4..80a7cc1c4 100644 --- a/spacy/tests/lang/hi/test_lex_attrs.py +++ b/spacy/tests/lang/hi/test_lex_attrs.py @@ -28,14 +28,16 @@ def test_hi_norm(word, word_norm): @pytest.mark.parametrize( - "word", ["१९८७", "1987", "१२,२६७", "उन्नीस", "पाँच", "नवासी", "५/१०"], + "word", + ["१९८७", "1987", "१२,२६७", "उन्नीस", "पाँच", "नवासी", "५/१०"], ) def test_hi_like_num(word): assert like_num(word) @pytest.mark.parametrize( - "word", ["पहला", "तृतीय", "निन्यानवेवाँ", "उन्नीस", "तिहत्तरवाँ", "छत्तीसवाँ",], + "word", + ["पहला", "तृतीय", "निन्यानवेवाँ", "उन्नीस", "तिहत्तरवाँ", "छत्तीसवाँ"], ) def test_hi_like_num_ordinal_words(word): assert like_num(word) From b1d568a4dffca6828824953d0cfa0e56ae3cfbfc Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 10:20:21 +0200 Subject: [PATCH 26/31] Tidy up tests --- spacy/tests/regression/test_issue5501-6000.py | 76 +++++++++++++++++++ spacy/tests/regression/test_issue5551.py | 37 --------- spacy/tests/regression/test_issue5838.py | 23 ------ spacy/tests/regression/test_issue5918.py | 29 ------- .../test_resource_warning.py} | 0 5 files changed, 76 insertions(+), 89 deletions(-) create mode 100644 spacy/tests/regression/test_issue5501-6000.py delete mode 100644 spacy/tests/regression/test_issue5551.py delete mode 100644 spacy/tests/regression/test_issue5838.py delete mode 100644 spacy/tests/regression/test_issue5918.py rename spacy/tests/{regression/test_issue5230.py => serialize/test_resource_warning.py} (100%) diff --git a/spacy/tests/regression/test_issue5501-6000.py b/spacy/tests/regression/test_issue5501-6000.py new file mode 100644 index 000000000..f0b46cb83 --- /dev/null +++ b/spacy/tests/regression/test_issue5501-6000.py @@ -0,0 +1,76 @@ +from thinc.api import fix_random_seed +from spacy.lang.en import English +from spacy.tokens import Span +from spacy import displacy +from spacy.pipeline import merge_entities + + +def test_issue5551(): + """Test that after fixing the random seed, the results of the pipeline are truly identical""" + component = "textcat" + pipe_cfg = { + "model": { + "@architectures": "spacy.TextCatBOW.v1", + "exclusive_classes": True, + "ngram_size": 2, + "no_output_layer": False, + } + } + results = [] + for i in range(3): + fix_random_seed(0) + nlp = English() + example = ( + "Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.", + {"cats": {"Labe1": 1.0, "Label2": 0.0, "Label3": 0.0}}, + ) + pipe = nlp.add_pipe(component, config=pipe_cfg, last=True) + for label in set(example[1]["cats"]): + pipe.add_label(label) + nlp.initialize() + # Store the result of each iteration + result = pipe.model.predict([nlp.make_doc(example[0])]) + results.append(list(result[0])) + # All results should be the same because of the fixed seed + assert len(results) == 3 + assert results[0] == results[1] + assert results[0] == results[2] + + +def test_issue5838(): + # Displacy's EntityRenderer break line + # not working after last entity + sample_text = "First line\nSecond line, with ent\nThird line\nFourth line\n" + nlp = English() + doc = nlp(sample_text) + doc.ents = [Span(doc, 7, 8, label="test")] + html = displacy.render(doc, style="ent") + found = html.count("
") + assert found == 4 + + +def test_issue5918(): + # Test edge case when merging entities. + nlp = English() + ruler = nlp.add_pipe("entity_ruler") + patterns = [ + {"label": "ORG", "pattern": "Digicon Inc"}, + {"label": "ORG", "pattern": "Rotan Mosle Inc's"}, + {"label": "ORG", "pattern": "Rotan Mosle Technology Partners Ltd"}, + ] + ruler.add_patterns(patterns) + + text = """ + Digicon Inc said it has completed the previously-announced disposition + of its computer systems division to an investment group led by + Rotan Mosle Inc's Rotan Mosle Technology Partners Ltd affiliate. + """ + doc = nlp(text) + assert len(doc.ents) == 3 + # make it so that the third span's head is within the entity (ent_iob=I) + # bug #5918 would wrongly transfer that I to the full entity, resulting in 2 instead of 3 final ents. + # TODO: test for logging here + # with pytest.warns(UserWarning): + # doc[29].head = doc[33] + doc = merge_entities(doc) + assert len(doc.ents) == 3 diff --git a/spacy/tests/regression/test_issue5551.py b/spacy/tests/regression/test_issue5551.py deleted file mode 100644 index 655764362..000000000 --- a/spacy/tests/regression/test_issue5551.py +++ /dev/null @@ -1,37 +0,0 @@ -from spacy.lang.en import English -from spacy.util import fix_random_seed - - -def test_issue5551(): - """Test that after fixing the random seed, the results of the pipeline are truly identical""" - component = "textcat" - pipe_cfg = { - "model": { - "@architectures": "spacy.TextCatBOW.v1", - "exclusive_classes": True, - "ngram_size": 2, - "no_output_layer": False, - } - } - - results = [] - for i in range(3): - fix_random_seed(0) - nlp = English() - example = ( - "Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.", - {"cats": {"Labe1": 1.0, "Label2": 0.0, "Label3": 0.0}}, - ) - pipe = nlp.add_pipe(component, config=pipe_cfg, last=True) - for label in set(example[1]["cats"]): - pipe.add_label(label) - nlp.initialize() - - # Store the result of each iteration - result = pipe.model.predict([nlp.make_doc(example[0])]) - results.append(list(result[0])) - - # All results should be the same because of the fixed seed - assert len(results) == 3 - assert results[0] == results[1] - assert results[0] == results[2] diff --git a/spacy/tests/regression/test_issue5838.py b/spacy/tests/regression/test_issue5838.py deleted file mode 100644 index 4e4d98beb..000000000 --- a/spacy/tests/regression/test_issue5838.py +++ /dev/null @@ -1,23 +0,0 @@ -from spacy.lang.en import English -from spacy.tokens import Span -from spacy import displacy - - -SAMPLE_TEXT = """First line -Second line, with ent -Third line -Fourth line -""" - - -def test_issue5838(): - # Displacy's EntityRenderer break line - # not working after last entity - - nlp = English() - doc = nlp(SAMPLE_TEXT) - doc.ents = [Span(doc, 7, 8, label="test")] - - html = displacy.render(doc, style="ent") - found = html.count("
") - assert found == 4 diff --git a/spacy/tests/regression/test_issue5918.py b/spacy/tests/regression/test_issue5918.py deleted file mode 100644 index d25323ef6..000000000 --- a/spacy/tests/regression/test_issue5918.py +++ /dev/null @@ -1,29 +0,0 @@ -from spacy.lang.en import English -from spacy.pipeline import merge_entities - - -def test_issue5918(): - # Test edge case when merging entities. - nlp = English() - ruler = nlp.add_pipe("entity_ruler") - patterns = [ - {"label": "ORG", "pattern": "Digicon Inc"}, - {"label": "ORG", "pattern": "Rotan Mosle Inc's"}, - {"label": "ORG", "pattern": "Rotan Mosle Technology Partners Ltd"}, - ] - ruler.add_patterns(patterns) - - text = """ - Digicon Inc said it has completed the previously-announced disposition - of its computer systems division to an investment group led by - Rotan Mosle Inc's Rotan Mosle Technology Partners Ltd affiliate. - """ - doc = nlp(text) - assert len(doc.ents) == 3 - # make it so that the third span's head is within the entity (ent_iob=I) - # bug #5918 would wrongly transfer that I to the full entity, resulting in 2 instead of 3 final ents. - # TODO: test for logging here - # with pytest.warns(UserWarning): - # doc[29].head = doc[33] - doc = merge_entities(doc) - assert len(doc.ents) == 3 diff --git a/spacy/tests/regression/test_issue5230.py b/spacy/tests/serialize/test_resource_warning.py similarity index 100% rename from spacy/tests/regression/test_issue5230.py rename to spacy/tests/serialize/test_resource_warning.py From 4fa869e6f72b152ebf15632ae1bceb18f5b03017 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 11:16:06 +0200 Subject: [PATCH 27/31] Update docs [ci skip] --- website/docs/usage/_benchmarks-models.md | 7 ++- website/docs/usage/facts-figures.md | 12 ++++ website/src/widgets/features.js | 72 ++++++++++++++++++++++++ website/src/widgets/landing.js | 55 +----------------- 4 files changed, 91 insertions(+), 55 deletions(-) create mode 100644 website/src/widgets/features.js diff --git a/website/docs/usage/_benchmarks-models.md b/website/docs/usage/_benchmarks-models.md index 4e6da9ad8..33b174f75 100644 --- a/website/docs/usage/_benchmarks-models.md +++ b/website/docs/usage/_benchmarks-models.md @@ -7,13 +7,14 @@ import { Help } from 'components/typography'; import Link from 'components/link' | Pipeline | Parser | Tagger | NER | WPS
CPU words per second on CPU, higher is better | WPS
GPU words per second on GPU, higher is better | | ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: | | [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | 95.5 | 98.3 | 89.7 | 1k | 8k | -| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | 92.2 | 97.4 | 85.8 | 7k | | -| `en_core_web_lg` (spaCy v2) | 91.9 | 97.2 | | 10k | | +| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | 92.2 | 97.4 | 85.4 | 7k | | +| `en_core_web_lg` (spaCy v2) | 91.9 | 97.2 | 85.7 | 10k | |
**Full pipeline accuracy and speed** on the -[OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) corpus. +[OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19) corpus (reported on +the development set).
diff --git a/website/docs/usage/facts-figures.md b/website/docs/usage/facts-figures.md index c7a7d0525..52696b7dc 100644 --- a/website/docs/usage/facts-figures.md +++ b/website/docs/usage/facts-figures.md @@ -10,6 +10,18 @@ menu: ## Comparison {#comparison hidden="true"} +spaCy is a **free, open-source library** for advanced **Natural Language +Processing** (NLP) in Python. It's designed specifically for **production use** +and helps you build applications that process and "understand" large volumes of +text. It can be used to build information extraction or natural language +understanding systems. + +### Feature overview {#comparison-features} + +import Features from 'widgets/features.js' + + + ### When should I use spaCy? {#comparison-usage} - ✅ **I'm a beginner and just getting started with NLP.** – spaCy makes it easy diff --git a/website/src/widgets/features.js b/website/src/widgets/features.js new file mode 100644 index 000000000..73863d5cc --- /dev/null +++ b/website/src/widgets/features.js @@ -0,0 +1,72 @@ +import React from 'react' +import { graphql, StaticQuery } from 'gatsby' + +import { Ul, Li } from '../components/list' + +export default () => ( + { + const { counts } = site.siteMetadata + return ( +
    +
  • + ✅ Support for {counts.langs}+ languages +
  • +
  • + ✅ {counts.models} trained pipelines for{' '} + {counts.modelLangs} languages +
  • +
  • + ✅ Multi-task learning with pretrained transformers like + BERT +
  • +
  • + ✅ Pretrained word vectors +
  • +
  • ✅ State-of-the-art speed
  • +
  • + ✅ Production-ready training system +
  • +
  • + ✅ Linguistically-motivated tokenization +
  • +
  • + ✅ Components for named entity recognition, part-of-speech + tagging, dependency parsing, sentence segmentation,{' '} + text classification, lemmatization, morphological analysis, + entity linking and more +
  • +
  • + ✅ Easily extensible with custom components and attributes +
  • +
  • + ✅ Support for custom models in PyTorch,{' '} + TensorFlow and other frameworks +
  • +
  • + ✅ Built in visualizers for syntax and NER +
  • +
  • + ✅ Easy model packaging, deployment and workflow management +
  • +
  • ✅ Robust, rigorously evaluated accuracy
  • +
+ ) + }} + /> +) + +const query = graphql` + query FeaturesQuery { + site { + siteMetadata { + counts { + langs + modelLangs + models + } + } + } + } +` diff --git a/website/src/widgets/landing.js b/website/src/widgets/landing.js index 46be93ab5..2cee9460f 100644 --- a/website/src/widgets/landing.js +++ b/website/src/widgets/landing.js @@ -14,13 +14,13 @@ import { LandingBanner, } from '../components/landing' import { H2 } from '../components/typography' -import { Ul, Li } from '../components/list' import { InlineCode } from '../components/code' import Button from '../components/button' import Link from '../components/link' import QuickstartTraining from './quickstart-training' import Project from './project' +import Features from './features' import courseImage from '../../docs/images/course.jpg' import prodigyImage from '../../docs/images/prodigy_overview.jpg' import projectsImage from '../../docs/images/projects.png' @@ -56,7 +56,7 @@ for entity in doc.ents: } const Landing = ({ data }) => { - const { counts, nightly } = data + const { nightly } = data const codeExample = getCodeExample(nightly) return ( <> @@ -98,51 +98,7 @@ const Landing = ({ data }) => {

Features

-
    -
  • - ✅ Support for {counts.langs}+ languages -
  • -
  • - ✅ {counts.models} trained pipelines for{' '} - {counts.modelLangs} languages -
  • -
  • - ✅ Multi-task learning with pretrained transformers{' '} - like BERT -
  • -
  • - ✅ Pretrained word vectors -
  • -
  • ✅ State-of-the-art speed
  • -
  • - ✅ Production-ready training system -
  • -
  • - ✅ Linguistically-motivated tokenization -
  • -
  • - ✅ Components for named entity recognition, - part-of-speech tagging, dependency parsing, sentence segmentation,{' '} - text classification, lemmatization, morphological - analysis, entity linking and more -
  • -
  • - ✅ Easily extensible with custom components and - attributes -
  • -
  • - ✅ Support for custom models in PyTorch,{' '} - TensorFlow and other frameworks -
  • -
  • - ✅ Built in visualizers for syntax and NER -
  • -
  • - ✅ Easy model packaging, deployment and workflow - management -
  • -
  • ✅ Robust, rigorously evaluated accuracy
  • -
+
@@ -333,11 +289,6 @@ const landingQuery = graphql` siteMetadata { nightly repo - counts { - langs - modelLangs - models - } } } } From 7f05ccc1709aaf9d8901767349fc6bd9e87a96e4 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 12:35:30 +0200 Subject: [PATCH 28/31] Update docs [ci skip] --- website/docs/usage/_benchmarks-models.md | 2 +- website/docs/usage/v3.md | 25 +++++++++++++++++++----- 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/website/docs/usage/_benchmarks-models.md b/website/docs/usage/_benchmarks-models.md index 33b174f75..a722c894f 100644 --- a/website/docs/usage/_benchmarks-models.md +++ b/website/docs/usage/_benchmarks-models.md @@ -6,7 +6,7 @@ import { Help } from 'components/typography'; import Link from 'components/link' | Pipeline | Parser | Tagger | NER | WPS
CPU words per second on CPU, higher is better | WPS
GPU words per second on GPU, higher is better | | ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: | -| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | 95.5 | 98.3 | 89.7 | 1k | 8k | +| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | 95.5 | 98.3 | 89.4 | 1k | 8k | | [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | 92.2 | 97.4 | 85.4 | 7k | | | `en_core_web_lg` (spaCy v2) | 91.9 | 97.2 | 85.7 | 10k | | diff --git a/website/docs/usage/v3.md b/website/docs/usage/v3.md index 9191a7db2..d9d636bb1 100644 --- a/website/docs/usage/v3.md +++ b/website/docs/usage/v3.md @@ -77,6 +77,26 @@ import Benchmarks from 'usage/\_benchmarks-models.md' +#### New trained transformer-based pipelines {#features-transformers-pipelines} + +> #### Notes on model capabilities +> +> The models are each trained with a **single transformer** shared across the +> pipeline, which requires it to be trained on a single corpus. For +> [English](/models/en) and [Chinese](/models/zh), we used the OntoNotes 5 +> corpus, which has annotations across several tasks. For [French](/models/fr), +> [Spanish](/models/es) and [German](/models/de), we didn't have a suitable +> corpus that had both syntactic and entity annotations, so the transformer +> models for those languages do not include NER. + +| Package | Language | Transformer | Tagger | Parser |  NER | +| ------------------------------------------------ | -------- | --------------------------------------------------------------------------------------------- | -----: | -----: | ---: | +| [`en_core_web_trf`](/models/en#en_core_web_trf) | English | [`roberta-base`](https://huggingface.co/roberta-base) | 97.8 | 95.0 | 89.4 | +| [`de_dep_news_trf`](/models/de#de_dep_news_trf) | German | [`bert-base-german-cased`](https://huggingface.co/bert-base-german-cased) | 99.0 | 95.8 | - | +| [`es_dep_news_trf`](/models/es#es_dep_news_trf) | Spanish | [`bert-base-spanish-wwm-cased`](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 98.2 | 94.6 | - | +| [`fr_dep_news_trf`](/models/fr#fr_dep_news_trf) | French | [`camembert-base`](https://huggingface.co/camembert-base) | 95.7 | 94.9 | - | +| [`zh_core_web_trf`](/models/zh#zh_core_news_trf) | Chinese | [`bert-base-chinese`](https://huggingface.co/bert-base-chinese) | 92.5 | 77.2 | 75.6 | + - **Usage:** [Embeddings & Transformers](/usage/embeddings-transformers), @@ -88,11 +108,6 @@ import Benchmarks from 'usage/\_benchmarks-models.md' - **Architectures: ** [TransformerModel](/api/architectures#TransformerModel), [TransformerListener](/api/architectures#TransformerListener), [Tok2VecTransformer](/api/architectures#Tok2VecTransformer) -- **Trained Pipelines:** [`en_core_web_trf`](/models/en#en_core_web_trf), - [`de_dep_news_trf`](/models/de#de_dep_news_trf), - [`es_dep_news_trf`](/models/es#es_dep_news_trf), - [`fr_dep_news_trf`](/models/fr#fr_dep_news_trf), - [`zh_core_web_trf`](/models/zh#zh_core_web_trf) - **Implementation:** [`spacy-transformers`](https://github.com/explosion/spacy-transformers) From 10611bf56ad14934a611b8ead9e2177c91b58d9e Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 13:30:11 +0200 Subject: [PATCH 29/31] Increment version [ci skip] --- spacy/about.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spacy/about.py b/spacy/about.py index 9c5dd0b4f..a19c785bc 100644 --- a/spacy/about.py +++ b/spacy/about.py @@ -1,6 +1,6 @@ # fmt: off __title__ = "spacy-nightly" -__version__ = "3.0.0a41" +__version__ = "3.0.0rc0" __download_url__ = "https://github.com/explosion/spacy-models/releases/download" __compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json" __projects__ = "https://github.com/explosion/projects" From ff4267d1812d853c79b7e4e937dc29e0e849155c Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 14:42:08 +0200 Subject: [PATCH 30/31] Fix success message [ci skip] --- spacy/about.py | 2 +- spacy/training/loop.py | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/spacy/about.py b/spacy/about.py index a19c785bc..bf1d53a7b 100644 --- a/spacy/about.py +++ b/spacy/about.py @@ -1,6 +1,6 @@ # fmt: off __title__ = "spacy-nightly" -__version__ = "3.0.0rc0" +__version__ = "3.0.0rc1" __download_url__ = "https://github.com/explosion/spacy-models/releases/download" __compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json" __projects__ = "https://github.com/explosion/projects" diff --git a/spacy/training/loop.py b/spacy/training/loop.py index c3fa83b39..eecb3e273 100644 --- a/spacy/training/loop.py +++ b/spacy/training/loop.py @@ -112,10 +112,10 @@ def train( nlp.to_disk(final_model_path) else: nlp.to_disk(final_model_path) - # This will only run if we don't hit an error - stdout.write( - msg.good("Saved pipeline to output directory", final_model_path) + "\n" - ) + # This will only run if we don't hit an error + stdout.write( + msg.good("Saved pipeline to output directory", final_model_path) + "\n" + ) def train_while_improving( From 09dbbe75d7350af3c42b070715f07c39b9489104 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 15 Oct 2020 17:27:24 +0200 Subject: [PATCH 31/31] Update docs [ci skip] --- website/docs/usage/_benchmarks-models.md | 27 ++++++++++-------------- website/docs/usage/facts-figures.md | 25 ---------------------- 2 files changed, 11 insertions(+), 41 deletions(-) diff --git a/website/docs/usage/_benchmarks-models.md b/website/docs/usage/_benchmarks-models.md index a722c894f..1e755e39d 100644 --- a/website/docs/usage/_benchmarks-models.md +++ b/website/docs/usage/_benchmarks-models.md @@ -1,14 +1,12 @@ import { Help } from 'components/typography'; import Link from 'components/link' - -
-| Pipeline | Parser | Tagger | NER | WPS
CPU words per second on CPU, higher is better | WPS
GPU words per second on GPU, higher is better | -| ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: | -| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | 95.5 | 98.3 | 89.4 | 1k | 8k | -| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | 92.2 | 97.4 | 85.4 | 7k | | -| `en_core_web_lg` (spaCy v2) | 91.9 | 97.2 | 85.7 | 10k | | +| Pipeline | Parser | Tagger | NER | +| ---------------------------------------------------------- | -----: | -----: | ---: | +| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | 95.5 | 98.3 | 89.4 | +| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | 92.2 | 97.4 | 85.4 | +| `en_core_web_lg` (spaCy v2) | 91.9 | 97.2 | 85.5 |
@@ -22,13 +20,11 @@ the development set).
-| Named Entity Recognition System | OntoNotes | CoNLL '03 | -| ------------------------------------------------------------------------------ | --------: | --------: | -| spaCy RoBERTa (2020) | 89.7 | 91.6 | -| spaCy CNN (2020) | 84.5 | 87.4 | -| [Stanza](https://stanfordnlp.github.io/stanza/) (StanfordNLP)1 | 88.8 | 92.1 | -| Flair2 | 89.7 | 93.1 | -| BERT Base3 | - | 92.4 | +| Named Entity Recognition System | OntoNotes | CoNLL '03 | +| -------------------------------- | --------: | --------: | +| spaCy RoBERTa (2020) | 89.7 | 91.6 | +| Stanza (StanfordNLP)1 | 88.8 | 92.1 | +| Flair2 | 89.7 | 93.1 |
@@ -39,8 +35,7 @@ the development set). more results. Project template: [`benchmarks/ner_conll03`](%%GITHUB_PROJECTS/benchmarks/ner_conll03). **1. ** [Qi et al. (2020)](https://arxiv.org/pdf/2003.07082.pdf). **2. ** -[Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/). **3. ** -[Devlin et al. (2018)](https://arxiv.org/abs/1810.04805). +[Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/).
diff --git a/website/docs/usage/facts-figures.md b/website/docs/usage/facts-figures.md index 52696b7dc..269ac5e17 100644 --- a/website/docs/usage/facts-figures.md +++ b/website/docs/usage/facts-figures.md @@ -78,7 +78,6 @@ import Benchmarks from 'usage/\_benchmarks-models.md' | Dependency Parsing System | UAS | LAS | | ------------------------------------------------------------------------------ | ---: | ---: | | spaCy RoBERTa (2020) | 95.5 | 94.3 | -| spaCy CNN (2020) | | | | [Mrini et al.](https://khalilmrini.github.io/Label_Attention_Layer.pdf) (2019) | 97.4 | 96.3 | | [Zhou and Zhao](https://www.aclweb.org/anthology/P19-1230/) (2019) | 97.2 | 95.7 | @@ -93,30 +92,6 @@ results. Project template:
-### Speed comparison {#benchmarks-speed} - - - -
- -| Library | Pipeline | WPS CPU words per second on CPU, higher is better | WPS GPU words per second on GPU, higher is better | -| ------- | ----------------------------------------------- | -------------------------------------------------------------: | -------------------------------------------------------------: | -| spaCy | [`en_core_web_md`](/models/en#en_core_web_md) | -| spaCy | [`en_core_web_trf`](/models/en#en_core_web_trf) | -| Stanza | `en_ewt` | | -| Flair | `pos-fast_ner-fast` | -| Flair | `pos_ner` | -| UDPipe | `english-ewt-ud-2.5` | - -
- -**End-to-end processing speed** on raw unannotated text. Project template: -[`benchmarks/speed`](%%GITHUB_PROJECTS/benchmarks/speed). - -
- -
-