mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	Merge pull request #6255 from explosion/master-tmp
This commit is contained in:
		
						commit
						db16059f9b
					
				
							
								
								
									
										106
									
								
								.github/contributors/Nuccy90.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/Nuccy90.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,106 @@
 | 
				
			||||||
 | 
					# spaCy contributor agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This spaCy Contributor Agreement (**"SCA"**) is based on the
 | 
				
			||||||
 | 
					[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 | 
				
			||||||
 | 
					The SCA applies to any contribution that you make to any product or project
 | 
				
			||||||
 | 
					managed by us (the **"project"**), and sets out the intellectual property rights
 | 
				
			||||||
 | 
					you grant to us in the contributed materials. The term **"us"** shall mean
 | 
				
			||||||
 | 
					[ExplosionAI GmbH](https://explosion.ai/legal). The term
 | 
				
			||||||
 | 
					**"you"** shall mean the person or entity identified below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you agree to be bound by these terms, fill in the information requested
 | 
				
			||||||
 | 
					below and include the filled-in version with your first pull request, under the
 | 
				
			||||||
 | 
					folder [`.github/contributors/`](/.github/contributors/). The name of the file
 | 
				
			||||||
 | 
					should be your GitHub username, with the extension `.md`. For example, the user
 | 
				
			||||||
 | 
					example_user would create the file `.github/contributors/example_user.md`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Read this agreement carefully before signing. These terms and conditions
 | 
				
			||||||
 | 
					constitute a binding legal agreement.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. The term "contribution" or "contributed materials" means any source code,
 | 
				
			||||||
 | 
					object code, patch, tool, sample, graphic, specification, manual,
 | 
				
			||||||
 | 
					documentation, or any other material posted or submitted by you to the project.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2. With respect to any worldwide copyrights, or copyright applications and
 | 
				
			||||||
 | 
					registrations, in your contribution:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you hereby assign to us joint ownership, and to the extent that such
 | 
				
			||||||
 | 
					    assignment is or becomes invalid, ineffective or unenforceable, you hereby
 | 
				
			||||||
 | 
					    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
 | 
				
			||||||
 | 
					    royalty-free, unrestricted license to exercise all rights under those
 | 
				
			||||||
 | 
					    copyrights. This includes, at our option, the right to sublicense these same
 | 
				
			||||||
 | 
					    rights to third parties through multiple levels of sublicensees or other
 | 
				
			||||||
 | 
					    licensing arrangements;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that each of us can do all things in relation to your
 | 
				
			||||||
 | 
					    contribution as if each of us were the sole owners, and if one of us makes
 | 
				
			||||||
 | 
					    a derivative work of your contribution, the one who makes the derivative
 | 
				
			||||||
 | 
					    work (or has it made will be the sole owner of that derivative work;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that you will not assert any moral rights in your contribution
 | 
				
			||||||
 | 
					    against us, our licensees or transferees;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that we may register a copyright in your contribution and
 | 
				
			||||||
 | 
					    exercise all ownership rights associated with it; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that neither of us has any duty to consult with, obtain the
 | 
				
			||||||
 | 
					    consent of, pay or render an accounting to the other for any use or
 | 
				
			||||||
 | 
					    distribution of your contribution.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3. With respect to any patents you own, or that you can license without payment
 | 
				
			||||||
 | 
					to any third party, you hereby grant to us a perpetual, irrevocable,
 | 
				
			||||||
 | 
					non-exclusive, worldwide, no-charge, royalty-free license to:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * make, have made, use, sell, offer to sell, import, and otherwise transfer
 | 
				
			||||||
 | 
					    your contribution in whole or in part, alone or in combination with or
 | 
				
			||||||
 | 
					    included in any product, work or materials arising out of the project to
 | 
				
			||||||
 | 
					    which your contribution was submitted, and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * at our option, to sublicense these same rights to third parties through
 | 
				
			||||||
 | 
					    multiple levels of sublicensees or other licensing arrangements.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					4. Except as set out above, you keep all right, title, and interest in your
 | 
				
			||||||
 | 
					contribution. The rights that you grant to us under these terms are effective
 | 
				
			||||||
 | 
					on the date you first submitted a contribution to us, even if your submission
 | 
				
			||||||
 | 
					took place before the date you sign these terms.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					5. You covenant, represent, warrant and agree that:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * Each contribution that you submit is and shall be an original work of
 | 
				
			||||||
 | 
					    authorship and you can legally grant the rights set out in this SCA;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * to the best of your knowledge, each contribution will not violate any
 | 
				
			||||||
 | 
					    third party's copyrights, trademarks, patents, or other intellectual
 | 
				
			||||||
 | 
					    property rights; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * each contribution shall be in compliance with U.S. export control laws and
 | 
				
			||||||
 | 
					    other applicable export and import laws. You agree to notify us if you
 | 
				
			||||||
 | 
					    become aware of any circumstance which would make any of the foregoing
 | 
				
			||||||
 | 
					    representations inaccurate in any respect. We may publicly disclose your
 | 
				
			||||||
 | 
					    participation in the project, including the fact that you have signed the SCA.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					6. This SCA is governed by the laws of the State of California and applicable
 | 
				
			||||||
 | 
					U.S. Federal law. Any choice of law rules will not apply.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					7. Please place an “x” on one of the applicable statement below. Please do NOT
 | 
				
			||||||
 | 
					mark both statements:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [x] I am signing on behalf of myself as an individual and no other person
 | 
				
			||||||
 | 
					    or entity, including my employer, has or will have rights with respect to my
 | 
				
			||||||
 | 
					    contributions.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [ ] I am signing on behalf of my employer or a legal entity and I have the
 | 
				
			||||||
 | 
					    actual authority to contractually bind that entity.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Details
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					| Field                          | Entry                |
 | 
				
			||||||
 | 
					|------------------------------- | -------------------- |
 | 
				
			||||||
 | 
					| Name                           | Elena Fano           |
 | 
				
			||||||
 | 
					| Company name (if applicable)   |                      |
 | 
				
			||||||
 | 
					| Title or role (if applicable)  |                      |
 | 
				
			||||||
 | 
					| Date                           | 2020-09-21           |
 | 
				
			||||||
 | 
					| GitHub username                | Nuccy90              |
 | 
				
			||||||
 | 
					| Website (optional)             |                      |
 | 
				
			||||||
							
								
								
									
										106
									
								
								.github/contributors/rahul1990gupta.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/rahul1990gupta.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,106 @@
 | 
				
			||||||
 | 
					# spaCy contributor agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This spaCy Contributor Agreement (**"SCA"**) is based on the
 | 
				
			||||||
 | 
					[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 | 
				
			||||||
 | 
					The SCA applies to any contribution that you make to any product or project
 | 
				
			||||||
 | 
					managed by us (the **"project"**), and sets out the intellectual property rights
 | 
				
			||||||
 | 
					you grant to us in the contributed materials. The term **"us"** shall mean
 | 
				
			||||||
 | 
					[ExplosionAI GmbH](https://explosion.ai/legal). The term
 | 
				
			||||||
 | 
					**"you"** shall mean the person or entity identified below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you agree to be bound by these terms, fill in the information requested
 | 
				
			||||||
 | 
					below and include the filled-in version with your first pull request, under the
 | 
				
			||||||
 | 
					folder [`.github/contributors/`](/.github/contributors/). The name of the file
 | 
				
			||||||
 | 
					should be your GitHub username, with the extension `.md`. For example, the user
 | 
				
			||||||
 | 
					example_user would create the file `.github/contributors/example_user.md`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Read this agreement carefully before signing. These terms and conditions
 | 
				
			||||||
 | 
					constitute a binding legal agreement.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. The term "contribution" or "contributed materials" means any source code,
 | 
				
			||||||
 | 
					object code, patch, tool, sample, graphic, specification, manual,
 | 
				
			||||||
 | 
					documentation, or any other material posted or submitted by you to the project.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2. With respect to any worldwide copyrights, or copyright applications and
 | 
				
			||||||
 | 
					registrations, in your contribution:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you hereby assign to us joint ownership, and to the extent that such
 | 
				
			||||||
 | 
					    assignment is or becomes invalid, ineffective or unenforceable, you hereby
 | 
				
			||||||
 | 
					    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
 | 
				
			||||||
 | 
					    royalty-free, unrestricted license to exercise all rights under those
 | 
				
			||||||
 | 
					    copyrights. This includes, at our option, the right to sublicense these same
 | 
				
			||||||
 | 
					    rights to third parties through multiple levels of sublicensees or other
 | 
				
			||||||
 | 
					    licensing arrangements;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that each of us can do all things in relation to your
 | 
				
			||||||
 | 
					    contribution as if each of us were the sole owners, and if one of us makes
 | 
				
			||||||
 | 
					    a derivative work of your contribution, the one who makes the derivative
 | 
				
			||||||
 | 
					    work (or has it made will be the sole owner of that derivative work;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that you will not assert any moral rights in your contribution
 | 
				
			||||||
 | 
					    against us, our licensees or transferees;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that we may register a copyright in your contribution and
 | 
				
			||||||
 | 
					    exercise all ownership rights associated with it; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that neither of us has any duty to consult with, obtain the
 | 
				
			||||||
 | 
					    consent of, pay or render an accounting to the other for any use or
 | 
				
			||||||
 | 
					    distribution of your contribution.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3. With respect to any patents you own, or that you can license without payment
 | 
				
			||||||
 | 
					to any third party, you hereby grant to us a perpetual, irrevocable,
 | 
				
			||||||
 | 
					non-exclusive, worldwide, no-charge, royalty-free license to:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * make, have made, use, sell, offer to sell, import, and otherwise transfer
 | 
				
			||||||
 | 
					    your contribution in whole or in part, alone or in combination with or
 | 
				
			||||||
 | 
					    included in any product, work or materials arising out of the project to
 | 
				
			||||||
 | 
					    which your contribution was submitted, and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * at our option, to sublicense these same rights to third parties through
 | 
				
			||||||
 | 
					    multiple levels of sublicensees or other licensing arrangements.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					4. Except as set out above, you keep all right, title, and interest in your
 | 
				
			||||||
 | 
					contribution. The rights that you grant to us under these terms are effective
 | 
				
			||||||
 | 
					on the date you first submitted a contribution to us, even if your submission
 | 
				
			||||||
 | 
					took place before the date you sign these terms.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					5. You covenant, represent, warrant and agree that:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * Each contribution that you submit is and shall be an original work of
 | 
				
			||||||
 | 
					    authorship and you can legally grant the rights set out in this SCA;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * to the best of your knowledge, each contribution will not violate any
 | 
				
			||||||
 | 
					    third party's copyrights, trademarks, patents, or other intellectual
 | 
				
			||||||
 | 
					    property rights; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * each contribution shall be in compliance with U.S. export control laws and
 | 
				
			||||||
 | 
					    other applicable export and import laws. You agree to notify us if you
 | 
				
			||||||
 | 
					    become aware of any circumstance which would make any of the foregoing
 | 
				
			||||||
 | 
					    representations inaccurate in any respect. We may publicly disclose your
 | 
				
			||||||
 | 
					    participation in the project, including the fact that you have signed the SCA.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					6. This SCA is governed by the laws of the State of California and applicable
 | 
				
			||||||
 | 
					U.S. Federal law. Any choice of law rules will not apply.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					7. Please place an “x” on one of the applicable statement below. Please do NOT
 | 
				
			||||||
 | 
					mark both statements:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [x] I am signing on behalf of myself as an individual and no other person
 | 
				
			||||||
 | 
					    or entity, including my employer, has or will have rights with respect to my
 | 
				
			||||||
 | 
					    contributions.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [ ] I am signing on behalf of my employer or a legal entity and I have the
 | 
				
			||||||
 | 
					    actual authority to contractually bind that entity.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Details
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					| Field                          | Entry                |
 | 
				
			||||||
 | 
					|------------------------------- | -------------------- |
 | 
				
			||||||
 | 
					| Name                           |  Rahul Gupta         |
 | 
				
			||||||
 | 
					| Company name (if applicable)   |                      |
 | 
				
			||||||
 | 
					| Title or role (if applicable)  |                      |
 | 
				
			||||||
 | 
					| Date                           |  28 July 2020        |
 | 
				
			||||||
 | 
					| GitHub username                |  rahul1990gupta      |
 | 
				
			||||||
 | 
					| Website (optional)             |                      |
 | 
				
			||||||
| 
						 | 
					@ -10,23 +10,26 @@ _stem_suffixes = [
 | 
				
			||||||
    ["ाएगी", "ाएगा", "ाओगी", "ाओगे", "एंगी", "ेंगी", "एंगे", "ेंगे", "ूंगी", "ूंगा", "ातीं", "नाओं", "नाएं", "ताओं", "ताएं", "ियाँ", "ियों", "ियां"],
 | 
					    ["ाएगी", "ाएगा", "ाओगी", "ाओगे", "एंगी", "ेंगी", "एंगे", "ेंगे", "ूंगी", "ूंगा", "ातीं", "नाओं", "नाएं", "ताओं", "ताएं", "ियाँ", "ियों", "ियां"],
 | 
				
			||||||
    ["ाएंगी", "ाएंगे", "ाऊंगी", "ाऊंगा", "ाइयाँ", "ाइयों", "ाइयां"]
 | 
					    ["ाएंगी", "ाएंगे", "ाऊंगी", "ाऊंगा", "ाइयाँ", "ाइयों", "ाइयां"]
 | 
				
			||||||
]
 | 
					]
 | 
				
			||||||
# fmt: on
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
# reference 1:https://en.wikipedia.org/wiki/Indian_numbering_system
 | 
					# reference 1: https://en.wikipedia.org/wiki/Indian_numbering_system
 | 
				
			||||||
# reference 2: https://blogs.transparent.com/hindi/hindi-numbers-1-100/
 | 
					# reference 2: https://blogs.transparent.com/hindi/hindi-numbers-1-100/
 | 
				
			||||||
 | 
					# reference 3: https://www.mindurhindi.com/basic-words-and-phrases-in-hindi/
 | 
				
			||||||
 | 
					
 | 
				
			||||||
_num_words = [
 | 
					_one_to_ten = [
 | 
				
			||||||
    "शून्य",
 | 
					    "शून्य",
 | 
				
			||||||
    "एक",
 | 
					    "एक",
 | 
				
			||||||
    "दो",
 | 
					    "दो",
 | 
				
			||||||
    "तीन",
 | 
					    "तीन",
 | 
				
			||||||
    "चार",
 | 
					    "चार",
 | 
				
			||||||
    "पांच",
 | 
					    "पांच", "पाँच",
 | 
				
			||||||
    "छह",
 | 
					    "छह",
 | 
				
			||||||
    "सात",
 | 
					    "सात",
 | 
				
			||||||
    "आठ",
 | 
					    "आठ",
 | 
				
			||||||
    "नौ",
 | 
					    "नौ",
 | 
				
			||||||
    "दस",
 | 
					    "दस",
 | 
				
			||||||
 | 
					]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					_eleven_to_beyond = [
 | 
				
			||||||
    "ग्यारह",
 | 
					    "ग्यारह",
 | 
				
			||||||
    "बारह",
 | 
					    "बारह",
 | 
				
			||||||
    "तेरह",
 | 
					    "तेरह",
 | 
				
			||||||
| 
						 | 
					@ -37,13 +40,85 @@ _num_words = [
 | 
				
			||||||
    "अठारह",
 | 
					    "अठारह",
 | 
				
			||||||
    "उन्नीस",
 | 
					    "उन्नीस",
 | 
				
			||||||
    "बीस",
 | 
					    "बीस",
 | 
				
			||||||
 | 
					    "इकीस", "इक्कीस",
 | 
				
			||||||
 | 
					    "बाईस",
 | 
				
			||||||
 | 
					    "तेइस",
 | 
				
			||||||
 | 
					    "चौबीस",
 | 
				
			||||||
 | 
					    "पच्चीस",
 | 
				
			||||||
 | 
					    "छब्बीस",
 | 
				
			||||||
 | 
					    "सताइस", "सत्ताइस",
 | 
				
			||||||
 | 
					    "अट्ठाइस",
 | 
				
			||||||
 | 
					    "उनतीस",
 | 
				
			||||||
    "तीस",
 | 
					    "तीस",
 | 
				
			||||||
 | 
					    "इकतीस", "इकत्तीस",
 | 
				
			||||||
 | 
					    "बतीस", "बत्तीस",
 | 
				
			||||||
 | 
					    "तैंतीस",
 | 
				
			||||||
 | 
					    "चौंतीस",
 | 
				
			||||||
 | 
					    "पैंतीस",
 | 
				
			||||||
 | 
					    "छतीस", "छत्तीस",
 | 
				
			||||||
 | 
					    "सैंतीस",
 | 
				
			||||||
 | 
					    "अड़तीस",
 | 
				
			||||||
 | 
					    "उनतालीस", "उनत्तीस",
 | 
				
			||||||
    "चालीस",
 | 
					    "चालीस",
 | 
				
			||||||
 | 
					    "इकतालीस",
 | 
				
			||||||
 | 
					    "बयालीस",
 | 
				
			||||||
 | 
					    "तैतालीस",
 | 
				
			||||||
 | 
					    "चवालीस",
 | 
				
			||||||
 | 
					    "पैंतालीस",
 | 
				
			||||||
 | 
					    "छयालिस",
 | 
				
			||||||
 | 
					    "सैंतालीस",
 | 
				
			||||||
 | 
					    "अड़तालीस",
 | 
				
			||||||
 | 
					    "उनचास",
 | 
				
			||||||
    "पचास",
 | 
					    "पचास",
 | 
				
			||||||
 | 
					    "इक्यावन",
 | 
				
			||||||
 | 
					    "बावन",
 | 
				
			||||||
 | 
					    "तिरपन", "तिरेपन",
 | 
				
			||||||
 | 
					    "चौवन", "चउवन",
 | 
				
			||||||
 | 
					    "पचपन", 
 | 
				
			||||||
 | 
					    "छप्पन",
 | 
				
			||||||
 | 
					    "सतावन", "सत्तावन",
 | 
				
			||||||
 | 
					    "अठावन",
 | 
				
			||||||
 | 
					    "उनसठ",
 | 
				
			||||||
    "साठ",
 | 
					    "साठ",
 | 
				
			||||||
 | 
					    "इकसठ",
 | 
				
			||||||
 | 
					    "बासठ",
 | 
				
			||||||
 | 
					    "तिरसठ", "तिरेसठ",
 | 
				
			||||||
 | 
					    "चौंसठ",
 | 
				
			||||||
 | 
					    "पैंसठ",
 | 
				
			||||||
 | 
					    "छियासठ",
 | 
				
			||||||
 | 
					    "सड़सठ",
 | 
				
			||||||
 | 
					    "अड़सठ",
 | 
				
			||||||
 | 
					    "उनहत्तर",
 | 
				
			||||||
    "सत्तर",
 | 
					    "सत्तर",
 | 
				
			||||||
 | 
					    "इकहत्तर"
 | 
				
			||||||
 | 
					    "बहत्तर", 
 | 
				
			||||||
 | 
					    "तिहत्तर",
 | 
				
			||||||
 | 
					    "चौहत्तर",
 | 
				
			||||||
 | 
					    "पचहत्तर",
 | 
				
			||||||
 | 
					    "छिहत्तर",
 | 
				
			||||||
 | 
					    "सतहत्तर",
 | 
				
			||||||
 | 
					    "अठहत्तर",
 | 
				
			||||||
 | 
					    "उन्नासी", "उन्यासी"
 | 
				
			||||||
    "अस्सी",
 | 
					    "अस्सी",
 | 
				
			||||||
 | 
					    "इक्यासी",
 | 
				
			||||||
 | 
					    "बयासी",
 | 
				
			||||||
 | 
					    "तिरासी",
 | 
				
			||||||
 | 
					    "चौरासी",
 | 
				
			||||||
 | 
					    "पचासी",
 | 
				
			||||||
 | 
					    "छियासी",
 | 
				
			||||||
 | 
					    "सतासी",
 | 
				
			||||||
 | 
					    "अट्ठासी",
 | 
				
			||||||
 | 
					    "नवासी",
 | 
				
			||||||
    "नब्बे",
 | 
					    "नब्बे",
 | 
				
			||||||
 | 
					    "इक्यानवे",
 | 
				
			||||||
 | 
					    "बानवे",
 | 
				
			||||||
 | 
					    "तिरानवे",
 | 
				
			||||||
 | 
					    "चौरानवे",
 | 
				
			||||||
 | 
					    "पचानवे",
 | 
				
			||||||
 | 
					    "छियानवे",
 | 
				
			||||||
 | 
					    "सतानवे",
 | 
				
			||||||
 | 
					    "अट्ठानवे",
 | 
				
			||||||
 | 
					    "निन्यानवे",
 | 
				
			||||||
    "सौ",
 | 
					    "सौ",
 | 
				
			||||||
    "हज़ार",
 | 
					    "हज़ार",
 | 
				
			||||||
    "लाख",
 | 
					    "लाख",
 | 
				
			||||||
| 
						 | 
					@ -52,6 +127,22 @@ _num_words = [
 | 
				
			||||||
    "खरब",
 | 
					    "खरब",
 | 
				
			||||||
]
 | 
					]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					_num_words = _one_to_ten + _eleven_to_beyond
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					_ordinal_words_one_to_ten = [
 | 
				
			||||||
 | 
					    "प्रथम", "पहला",
 | 
				
			||||||
 | 
					    "द्वितीय", "दूसरा",
 | 
				
			||||||
 | 
					    "तृतीय", "तीसरा",
 | 
				
			||||||
 | 
					    "चौथा",
 | 
				
			||||||
 | 
					    "पांचवाँ",
 | 
				
			||||||
 | 
					    "छठा",
 | 
				
			||||||
 | 
					    "सातवाँ",
 | 
				
			||||||
 | 
					    "आठवाँ",
 | 
				
			||||||
 | 
					    "नौवाँ",
 | 
				
			||||||
 | 
					    "दसवाँ",
 | 
				
			||||||
 | 
					]
 | 
				
			||||||
 | 
					_ordinal_suffix = "वाँ"
 | 
				
			||||||
 | 
					# fmt: on
 | 
				
			||||||
 | 
					
 | 
				
			||||||
def norm(string):
 | 
					def norm(string):
 | 
				
			||||||
    # normalise base exceptions,  e.g. punctuation or currency symbols
 | 
					    # normalise base exceptions,  e.g. punctuation or currency symbols
 | 
				
			||||||
| 
						 | 
					@ -64,7 +155,7 @@ def norm(string):
 | 
				
			||||||
    for suffix_group in reversed(_stem_suffixes):
 | 
					    for suffix_group in reversed(_stem_suffixes):
 | 
				
			||||||
        length = len(suffix_group[0])
 | 
					        length = len(suffix_group[0])
 | 
				
			||||||
        if len(string) <= length:
 | 
					        if len(string) <= length:
 | 
				
			||||||
            break
 | 
					            continue
 | 
				
			||||||
        for suffix in suffix_group:
 | 
					        for suffix in suffix_group:
 | 
				
			||||||
            if string.endswith(suffix):
 | 
					            if string.endswith(suffix):
 | 
				
			||||||
                return string[:-length]
 | 
					                return string[:-length]
 | 
				
			||||||
| 
						 | 
					@ -74,7 +165,7 @@ def norm(string):
 | 
				
			||||||
def like_num(text):
 | 
					def like_num(text):
 | 
				
			||||||
    if text.startswith(("+", "-", "±", "~")):
 | 
					    if text.startswith(("+", "-", "±", "~")):
 | 
				
			||||||
        text = text[1:]
 | 
					        text = text[1:]
 | 
				
			||||||
    text = text.replace(", ", "").replace(".", "")
 | 
					    text = text.replace(",", "").replace(".", "")
 | 
				
			||||||
    if text.isdigit():
 | 
					    if text.isdigit():
 | 
				
			||||||
        return True
 | 
					        return True
 | 
				
			||||||
    if text.count("/") == 1:
 | 
					    if text.count("/") == 1:
 | 
				
			||||||
| 
						 | 
					@ -83,6 +174,14 @@ def like_num(text):
 | 
				
			||||||
            return True
 | 
					            return True
 | 
				
			||||||
    if text.lower() in _num_words:
 | 
					    if text.lower() in _num_words:
 | 
				
			||||||
        return True
 | 
					        return True
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    # check ordinal numbers
 | 
				
			||||||
 | 
					    # reference: http://www.englishkitab.com/Vocabulary/Numbers.html
 | 
				
			||||||
 | 
					    if text in _ordinal_words_one_to_ten:
 | 
				
			||||||
 | 
					        return True
 | 
				
			||||||
 | 
					    if text.endswith(_ordinal_suffix):
 | 
				
			||||||
 | 
					        if text[:-len(_ordinal_suffix)] in _eleven_to_beyond:
 | 
				
			||||||
 | 
					            return True
 | 
				
			||||||
    return False
 | 
					    return False
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -19,4 +19,6 @@ sentences = [
 | 
				
			||||||
    "தன்னாட்சி கார்கள் காப்பீட்டு பொறுப்பை உற்பத்தியாளரிடம் மாற்றுகின்றன",
 | 
					    "தன்னாட்சி கார்கள் காப்பீட்டு பொறுப்பை உற்பத்தியாளரிடம் மாற்றுகின்றன",
 | 
				
			||||||
    "நடைபாதை விநியோக ரோபோக்களை தடை செய்வதை சான் பிரான்சிஸ்கோ கருதுகிறது",
 | 
					    "நடைபாதை விநியோக ரோபோக்களை தடை செய்வதை சான் பிரான்சிஸ்கோ கருதுகிறது",
 | 
				
			||||||
    "லண்டன் ஐக்கிய இராச்சியத்தில் ஒரு பெரிய நகரம்.",
 | 
					    "லண்டன் ஐக்கிய இராச்சியத்தில் ஒரு பெரிய நகரம்.",
 | 
				
			||||||
 | 
					    "என்ன வேலை செய்கிறீர்கள்?",
 | 
				
			||||||
 | 
					    "எந்த கல்லூரியில் படிக்கிறாய்?",
 | 
				
			||||||
]
 | 
					]
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -73,20 +73,16 @@ def like_num(text):
 | 
				
			||||||
        num, denom = text.split("/")
 | 
					        num, denom = text.split("/")
 | 
				
			||||||
        if num.isdigit() and denom.isdigit():
 | 
					        if num.isdigit() and denom.isdigit():
 | 
				
			||||||
            return True
 | 
					            return True
 | 
				
			||||||
 | 
					 | 
				
			||||||
    text_lower = text.lower()
 | 
					    text_lower = text.lower()
 | 
				
			||||||
 | 
					 | 
				
			||||||
    # Check cardinal number
 | 
					    # Check cardinal number
 | 
				
			||||||
    if text_lower in _num_words:
 | 
					    if text_lower in _num_words:
 | 
				
			||||||
        return True
 | 
					        return True
 | 
				
			||||||
 | 
					 | 
				
			||||||
    # Check ordinal number
 | 
					    # Check ordinal number
 | 
				
			||||||
    if text_lower in _ordinal_words:
 | 
					    if text_lower in _ordinal_words:
 | 
				
			||||||
        return True
 | 
					        return True
 | 
				
			||||||
    if text_lower.endswith(_ordinal_endings):
 | 
					    if text_lower.endswith(_ordinal_endings):
 | 
				
			||||||
        if text_lower[:-3].isdigit() or text_lower[:-4].isdigit():
 | 
					        if text_lower[:-3].isdigit() or text_lower[:-4].isdigit():
 | 
				
			||||||
            return True
 | 
					            return True
 | 
				
			||||||
 | 
					 | 
				
			||||||
    return False
 | 
					    return False
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -1,6 +1,3 @@
 | 
				
			||||||
# coding: utf8
 | 
					 | 
				
			||||||
from __future__ import unicode_literals
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
from ...symbols import NOUN, PROPN, PRON
 | 
					from ...symbols import NOUN, PROPN, PRON
 | 
				
			||||||
from ...errors import Errors
 | 
					from ...errors import Errors
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -125,6 +125,11 @@ def he_tokenizer():
 | 
				
			||||||
    return get_lang_class("he")().tokenizer
 | 
					    return get_lang_class("he")().tokenizer
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					@pytest.fixture(scope="session")
 | 
				
			||||||
 | 
					def hi_tokenizer():
 | 
				
			||||||
 | 
					    return get_lang_class("hi")().tokenizer
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@pytest.fixture(scope="session")
 | 
					@pytest.fixture(scope="session")
 | 
				
			||||||
def hr_tokenizer():
 | 
					def hr_tokenizer():
 | 
				
			||||||
    return get_lang_class("hr")().tokenizer
 | 
					    return get_lang_class("hr")().tokenizer
 | 
				
			||||||
| 
						 | 
					@ -240,11 +245,6 @@ def tr_tokenizer():
 | 
				
			||||||
    return get_lang_class("tr")().tokenizer
 | 
					    return get_lang_class("tr")().tokenizer
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@pytest.fixture(scope="session")
 | 
					 | 
				
			||||||
def tr_vocab():
 | 
					 | 
				
			||||||
    return get_lang_class("tr").Defaults.create_vocab()
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
@pytest.fixture(scope="session")
 | 
					@pytest.fixture(scope="session")
 | 
				
			||||||
def tt_tokenizer():
 | 
					def tt_tokenizer():
 | 
				
			||||||
    return get_lang_class("tt")().tokenizer
 | 
					    return get_lang_class("tt")().tokenizer
 | 
				
			||||||
| 
						 | 
					@ -297,11 +297,7 @@ def zh_tokenizer_pkuseg():
 | 
				
			||||||
                "segmenter": "pkuseg",
 | 
					                "segmenter": "pkuseg",
 | 
				
			||||||
            }
 | 
					            }
 | 
				
			||||||
        },
 | 
					        },
 | 
				
			||||||
        "initialize": {
 | 
					        "initialize": {"tokenizer": {"pkuseg_model": "web",}},
 | 
				
			||||||
            "tokenizer": {
 | 
					 | 
				
			||||||
                "pkuseg_model": "web",
 | 
					 | 
				
			||||||
            }
 | 
					 | 
				
			||||||
        },
 | 
					 | 
				
			||||||
    }
 | 
					    }
 | 
				
			||||||
    nlp = get_lang_class("zh").from_config(config)
 | 
					    nlp = get_lang_class("zh").from_config(config)
 | 
				
			||||||
    nlp.initialize()
 | 
					    nlp.initialize()
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
							
								
								
									
										0
									
								
								spacy/tests/lang/hi/__init__.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										0
									
								
								spacy/tests/lang/hi/__init__.py
									
									
									
									
									
										Normal file
									
								
							
							
								
								
									
										41
									
								
								spacy/tests/lang/hi/test_lex_attrs.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										41
									
								
								spacy/tests/lang/hi/test_lex_attrs.py
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,41 @@
 | 
				
			||||||
 | 
					import pytest
 | 
				
			||||||
 | 
					from spacy.lang.hi.lex_attrs import norm, like_num
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					def test_hi_tokenizer_handles_long_text(hi_tokenizer):
 | 
				
			||||||
 | 
					    text = """
 | 
				
			||||||
 | 
					ये कहानी 1900 के दशक की है। कौशल्या (स्मिता जयकर) को पता चलता है कि उसका
 | 
				
			||||||
 | 
					छोटा बेटा, देवदास (शाहरुख खान) वापस घर आ रहा है। देवदास 10 साल पहले कानून की
 | 
				
			||||||
 | 
					पढ़ाई करने के लिए इंग्लैंड गया था। उसके लौटने की खुशी में ये बात कौशल्या अपनी पड़ोस
 | 
				
			||||||
 | 
					में रहने वाली सुमित्रा (किरण खेर) को भी बता देती है। इस खबर से वो भी खुश हो जाती है।
 | 
				
			||||||
 | 
					"""
 | 
				
			||||||
 | 
					    tokens = hi_tokenizer(text)
 | 
				
			||||||
 | 
					    assert len(tokens) == 86
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					@pytest.mark.parametrize(
 | 
				
			||||||
 | 
					    "word,word_norm",
 | 
				
			||||||
 | 
					    [
 | 
				
			||||||
 | 
					        ("चलता", "चल"),
 | 
				
			||||||
 | 
					        ("पढ़ाई", "पढ़"),
 | 
				
			||||||
 | 
					        ("देती", "दे"),
 | 
				
			||||||
 | 
					        ("जाती", "ज"),
 | 
				
			||||||
 | 
					        ("मुस्कुराकर", "मुस्कुर"),
 | 
				
			||||||
 | 
					    ],
 | 
				
			||||||
 | 
					)
 | 
				
			||||||
 | 
					def test_hi_norm(word, word_norm):
 | 
				
			||||||
 | 
					    assert norm(word) == word_norm
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					@pytest.mark.parametrize(
 | 
				
			||||||
 | 
					    "word", ["१९८७", "1987", "१२,२६७", "उन्नीस", "पाँच", "नवासी", "५/१०"],
 | 
				
			||||||
 | 
					)
 | 
				
			||||||
 | 
					def test_hi_like_num(word):
 | 
				
			||||||
 | 
					    assert like_num(word)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					@pytest.mark.parametrize(
 | 
				
			||||||
 | 
					    "word", ["पहला", "तृतीय", "निन्यानवेवाँ", "उन्नीस", "तिहत्तरवाँ", "छत्तीसवाँ",],
 | 
				
			||||||
 | 
					)
 | 
				
			||||||
 | 
					def test_hi_like_num_ordinal_words(word):
 | 
				
			||||||
 | 
					    assert like_num(word)
 | 
				
			||||||
| 
						 | 
					@ -489,11 +489,11 @@ This allows you to write callbacks that consider the entire set of matched
 | 
				
			||||||
phrases, so that you can resolve overlaps and other conflicts in whatever way
 | 
					phrases, so that you can resolve overlaps and other conflicts in whatever way
 | 
				
			||||||
you prefer.
 | 
					you prefer.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| Argument  | Description                                                                                                                                        |
 | 
					| Argument  | Description                                                                                                                                       |
 | 
				
			||||||
| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
					| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
				
			||||||
| `matcher` | The matcher instance. ~~Matcher~~                                                                                                                  |
 | 
					| `matcher` | The matcher instance. ~~Matcher~~                                                                                                                 |
 | 
				
			||||||
| `doc`     | The document the matcher was used on. ~~Doc~~                                                                                                      |
 | 
					| `doc`     | The document the matcher was used on. ~~Doc~~                                                                                                     |
 | 
				
			||||||
| `i`       | Index of the current match (`matches[i`]). ~~int~~                                                                                                 |
 | 
					| `i`       | Index of the current match (`matches[i`]). ~~int~~                                                                                                |
 | 
				
			||||||
| `matches` | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. ~~List[Tuple[int, int int]]~~ |
 | 
					| `matches` | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. ~~List[Tuple[int, int int]]~~ |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Creating spans from matches {#matcher-spans}
 | 
					### Creating spans from matches {#matcher-spans}
 | 
				
			||||||
| 
						 | 
					@ -631,8 +631,8 @@ To get a quick overview of the results, you could collect all sentences
 | 
				
			||||||
containing a match and render them with the
 | 
					containing a match and render them with the
 | 
				
			||||||
[displaCy visualizer](/usage/visualizers). In the callback function, you'll have
 | 
					[displaCy visualizer](/usage/visualizers). In the callback function, you'll have
 | 
				
			||||||
access to the `start` and `end` of each match, as well as the parent `Doc`. This
 | 
					access to the `start` and `end` of each match, as well as the parent `Doc`. This
 | 
				
			||||||
lets you determine the sentence containing the match, `doc[start:end].sent`,
 | 
					lets you determine the sentence containing the match, `doc[start:end].sent`, and
 | 
				
			||||||
and calculate the start and end of the matched span within the sentence. Using
 | 
					calculate the start and end of the matched span within the sentence. Using
 | 
				
			||||||
displaCy in ["manual" mode](/usage/visualizers#manual-usage) lets you pass in a
 | 
					displaCy in ["manual" mode](/usage/visualizers#manual-usage) lets you pass in a
 | 
				
			||||||
list of dictionaries containing the text and entities to render.
 | 
					list of dictionaries containing the text and entities to render.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user