mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
		
						commit
						1b348389bb
					
				
							
								
								
									
										106
									
								
								.github/contributors/DuyguA.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/DuyguA.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,106 @@
 | 
				
			||||||
 | 
					# spaCy contributor agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This spaCy Contributor Agreement (**"SCA"**) is based on the
 | 
				
			||||||
 | 
					[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 | 
				
			||||||
 | 
					The SCA applies to any contribution that you make to any product or project
 | 
				
			||||||
 | 
					managed by us (the **"project"**), and sets out the intellectual property rights
 | 
				
			||||||
 | 
					you grant to us in the contributed materials. The term **"us"** shall mean
 | 
				
			||||||
 | 
					[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 | 
				
			||||||
 | 
					**"you"** shall mean the person or entity identified below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you agree to be bound by these terms, fill in the information requested
 | 
				
			||||||
 | 
					below and include the filled-in version with your first pull request, under the
 | 
				
			||||||
 | 
					folder [`.github/contributors/`](/.github/contributors/). The name of the file
 | 
				
			||||||
 | 
					should be your GitHub username, with the extension `.md`. For example, the user
 | 
				
			||||||
 | 
					example_user would create the file `.github/contributors/example_user.md`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Read this agreement carefully before signing. These terms and conditions
 | 
				
			||||||
 | 
					constitute a binding legal agreement.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. The term "contribution" or "contributed materials" means any source code,
 | 
				
			||||||
 | 
					object code, patch, tool, sample, graphic, specification, manual,
 | 
				
			||||||
 | 
					documentation, or any other material posted or submitted by you to the project.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2. With respect to any worldwide copyrights, or copyright applications and
 | 
				
			||||||
 | 
					registrations, in your contribution:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you hereby assign to us joint ownership, and to the extent that such
 | 
				
			||||||
 | 
					    assignment is or becomes invalid, ineffective or unenforceable, you hereby
 | 
				
			||||||
 | 
					    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
 | 
				
			||||||
 | 
					    royalty-free, unrestricted license to exercise all rights under those
 | 
				
			||||||
 | 
					    copyrights. This includes, at our option, the right to sublicense these same
 | 
				
			||||||
 | 
					    rights to third parties through multiple levels of sublicensees or other
 | 
				
			||||||
 | 
					    licensing arrangements;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that each of us can do all things in relation to your
 | 
				
			||||||
 | 
					    contribution as if each of us were the sole owners, and if one of us makes
 | 
				
			||||||
 | 
					    a derivative work of your contribution, the one who makes the derivative
 | 
				
			||||||
 | 
					    work (or has it made will be the sole owner of that derivative work;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that you will not assert any moral rights in your contribution
 | 
				
			||||||
 | 
					    against us, our licensees or transferees;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that we may register a copyright in your contribution and
 | 
				
			||||||
 | 
					    exercise all ownership rights associated with it; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that neither of us has any duty to consult with, obtain the
 | 
				
			||||||
 | 
					    consent of, pay or render an accounting to the other for any use or
 | 
				
			||||||
 | 
					    distribution of your contribution.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3. With respect to any patents you own, or that you can license without payment
 | 
				
			||||||
 | 
					to any third party, you hereby grant to us a perpetual, irrevocable,
 | 
				
			||||||
 | 
					non-exclusive, worldwide, no-charge, royalty-free license to:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * make, have made, use, sell, offer to sell, import, and otherwise transfer
 | 
				
			||||||
 | 
					    your contribution in whole or in part, alone or in combination with or
 | 
				
			||||||
 | 
					    included in any product, work or materials arising out of the project to
 | 
				
			||||||
 | 
					    which your contribution was submitted, and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * at our option, to sublicense these same rights to third parties through
 | 
				
			||||||
 | 
					    multiple levels of sublicensees or other licensing arrangements.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					4. Except as set out above, you keep all right, title, and interest in your
 | 
				
			||||||
 | 
					contribution. The rights that you grant to us under these terms are effective
 | 
				
			||||||
 | 
					on the date you first submitted a contribution to us, even if your submission
 | 
				
			||||||
 | 
					took place before the date you sign these terms.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					5. You covenant, represent, warrant and agree that:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * Each contribution that you submit is and shall be an original work of
 | 
				
			||||||
 | 
					    authorship and you can legally grant the rights set out in this SCA;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * to the best of your knowledge, each contribution will not violate any
 | 
				
			||||||
 | 
					    third party's copyrights, trademarks, patents, or other intellectual
 | 
				
			||||||
 | 
					    property rights; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * each contribution shall be in compliance with U.S. export control laws and
 | 
				
			||||||
 | 
					    other applicable export and import laws. You agree to notify us if you
 | 
				
			||||||
 | 
					    become aware of any circumstance which would make any of the foregoing
 | 
				
			||||||
 | 
					    representations inaccurate in any respect. We may publicly disclose your 
 | 
				
			||||||
 | 
					    participation in the project, including the fact that you have signed the SCA.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					6. This SCA is governed by the laws of the State of California and applicable
 | 
				
			||||||
 | 
					U.S. Federal law. Any choice of law rules will not apply.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					7. Please place an “x” on one of the applicable statement below. Please do NOT
 | 
				
			||||||
 | 
					mark both statements:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [x] I am signing on behalf of myself as an individual and no other person
 | 
				
			||||||
 | 
					    or entity, including my employer, has or will have rights with respect to my
 | 
				
			||||||
 | 
					    contributions.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [ ] I am signing on behalf of my employer or a legal entity and I have the
 | 
				
			||||||
 | 
					    actual authority to contractually bind that entity.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Details
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					| Field                          | Entry                              |
 | 
				
			||||||
 | 
					|------------------------------- | --------------------               |
 | 
				
			||||||
 | 
					| Name                           | Duygu Altinok                      |
 | 
				
			||||||
 | 
					| Company name (if applicable)   |                                    |
 | 
				
			||||||
 | 
					| Title or role (if applicable)  |                                    |
 | 
				
			||||||
 | 
					| Date                           | 13 November 2017                   |
 | 
				
			||||||
 | 
					| GitHub username                | DuyguA                             |
 | 
				
			||||||
 | 
					| Website (optional)             |                                    |
 | 
				
			||||||
							
								
								
									
										106
									
								
								.github/contributors/abhi18av.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/abhi18av.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,106 @@
 | 
				
			||||||
 | 
					# spaCy contributor agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This spaCy Contributor Agreement (**"SCA"**) is based on the
 | 
				
			||||||
 | 
					[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 | 
				
			||||||
 | 
					The SCA applies to any contribution that you make to any product or project
 | 
				
			||||||
 | 
					managed by us (the **"project"**), and sets out the intellectual property rights
 | 
				
			||||||
 | 
					you grant to us in the contributed materials. The term **"us"** shall mean
 | 
				
			||||||
 | 
					[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 | 
				
			||||||
 | 
					**"you"** shall mean the person or entity identified below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you agree to be bound by these terms, fill in the information requested
 | 
				
			||||||
 | 
					below and include the filled-in version with your first pull request, under the
 | 
				
			||||||
 | 
					folder [`.github/contributors/`](/.github/contributors/). The name of the file
 | 
				
			||||||
 | 
					should be your GitHub username, with the extension `.md`. For example, the user
 | 
				
			||||||
 | 
					example_user would create the file `.github/contributors/example_user.md`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Read this agreement carefully before signing. These terms and conditions
 | 
				
			||||||
 | 
					constitute a binding legal agreement.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Agreement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. The term "contribution" or "contributed materials" means any source code,
 | 
				
			||||||
 | 
					object code, patch, tool, sample, graphic, specification, manual,
 | 
				
			||||||
 | 
					documentation, or any other material posted or submitted by you to the project.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2. With respect to any worldwide copyrights, or copyright applications and
 | 
				
			||||||
 | 
					registrations, in your contribution:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you hereby assign to us joint ownership, and to the extent that such
 | 
				
			||||||
 | 
					    assignment is or becomes invalid, ineffective or unenforceable, you hereby
 | 
				
			||||||
 | 
					    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
 | 
				
			||||||
 | 
					    royalty-free, unrestricted license to exercise all rights under those
 | 
				
			||||||
 | 
					    copyrights. This includes, at our option, the right to sublicense these same
 | 
				
			||||||
 | 
					    rights to third parties through multiple levels of sublicensees or other
 | 
				
			||||||
 | 
					    licensing arrangements;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that each of us can do all things in relation to your
 | 
				
			||||||
 | 
					    contribution as if each of us were the sole owners, and if one of us makes
 | 
				
			||||||
 | 
					    a derivative work of your contribution, the one who makes the derivative
 | 
				
			||||||
 | 
					    work (or has it made will be the sole owner of that derivative work;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that you will not assert any moral rights in your contribution
 | 
				
			||||||
 | 
					    against us, our licensees or transferees;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that we may register a copyright in your contribution and
 | 
				
			||||||
 | 
					    exercise all ownership rights associated with it; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * you agree that neither of us has any duty to consult with, obtain the
 | 
				
			||||||
 | 
					    consent of, pay or render an accounting to the other for any use or
 | 
				
			||||||
 | 
					    distribution of your contribution.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3. With respect to any patents you own, or that you can license without payment
 | 
				
			||||||
 | 
					to any third party, you hereby grant to us a perpetual, irrevocable,
 | 
				
			||||||
 | 
					non-exclusive, worldwide, no-charge, royalty-free license to:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * make, have made, use, sell, offer to sell, import, and otherwise transfer
 | 
				
			||||||
 | 
					    your contribution in whole or in part, alone or in combination with or
 | 
				
			||||||
 | 
					    included in any product, work or materials arising out of the project to
 | 
				
			||||||
 | 
					    which your contribution was submitted, and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * at our option, to sublicense these same rights to third parties through
 | 
				
			||||||
 | 
					    multiple levels of sublicensees or other licensing arrangements.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					4. Except as set out above, you keep all right, title, and interest in your
 | 
				
			||||||
 | 
					contribution. The rights that you grant to us under these terms are effective
 | 
				
			||||||
 | 
					on the date you first submitted a contribution to us, even if your submission
 | 
				
			||||||
 | 
					took place before the date you sign these terms.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					5. You covenant, represent, warrant and agree that:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * Each contribution that you submit is and shall be an original work of
 | 
				
			||||||
 | 
					    authorship and you can legally grant the rights set out in this SCA;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * to the best of your knowledge, each contribution will not violate any
 | 
				
			||||||
 | 
					    third party's copyrights, trademarks, patents, or other intellectual
 | 
				
			||||||
 | 
					    property rights; and
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * each contribution shall be in compliance with U.S. export control laws and
 | 
				
			||||||
 | 
					    other applicable export and import laws. You agree to notify us if you
 | 
				
			||||||
 | 
					    become aware of any circumstance which would make any of the foregoing
 | 
				
			||||||
 | 
					    representations inaccurate in any respect. We may publicly disclose your 
 | 
				
			||||||
 | 
					    participation in the project, including the fact that you have signed the SCA.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					6. This SCA is governed by the laws of the State of California and applicable
 | 
				
			||||||
 | 
					U.S. Federal law. Any choice of law rules will not apply.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					7. Please place an “x” on one of the applicable statement below. Please do NOT
 | 
				
			||||||
 | 
					mark both statements:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [x] I am signing on behalf of myself as an individual and no other person
 | 
				
			||||||
 | 
					    or entity, including my employer, has or will have rights with respect to my
 | 
				
			||||||
 | 
					    contributions.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    * [ ] I am signing on behalf of my employer or a legal entity and I have the
 | 
				
			||||||
 | 
					    actual authority to contractually bind that entity.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Contributor Details
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					| Field                          | Entry                              |
 | 
				
			||||||
 | 
					|------------------------------- | --------------------               |
 | 
				
			||||||
 | 
					| Name                           | Abhinav Sharma                     |
 | 
				
			||||||
 | 
					| Company name (if applicable)   | Fourtek I.T. Solutions Pvt. Ltd.   |
 | 
				
			||||||
 | 
					| Title or role (if applicable)  | Machine Learning Engineer          |
 | 
				
			||||||
 | 
					| Date                           | 3 November 2017                   |
 | 
				
			||||||
 | 
					| GitHub username                | abhi18av                           |
 | 
				
			||||||
 | 
					| Website (optional)             | https://abhi18av.github.io/        |
 | 
				
			||||||
| 
						 | 
					@ -88,8 +88,10 @@ requests:
 | 
				
			||||||
| [`models`](https://github.com/explosion/spaCy/labels/models), `language / [name]` | Issues related to the specific [models](https://github.com/explosion/spacy-models), languages and data |
 | 
					| [`models`](https://github.com/explosion/spaCy/labels/models), `language / [name]` | Issues related to the specific [models](https://github.com/explosion/spacy-models), languages and data |
 | 
				
			||||||
| [`linux`](https://github.com/explosion/spaCy/labels/linux), [`osx`](https://github.com/explosion/spaCy/labels/osx), [`windows`](https://github.com/explosion/spaCy/labels/windows) | Issues related to the specific operating systems |
 | 
					| [`linux`](https://github.com/explosion/spaCy/labels/linux), [`osx`](https://github.com/explosion/spaCy/labels/osx), [`windows`](https://github.com/explosion/spaCy/labels/windows) | Issues related to the specific operating systems |
 | 
				
			||||||
| [`pip`](https://github.com/explosion/spaCy/labels/pip), [`conda`](https://github.com/explosion/spaCy/labels/conda) | Issues related to the specific package managers |
 | 
					| [`pip`](https://github.com/explosion/spaCy/labels/pip), [`conda`](https://github.com/explosion/spaCy/labels/conda) | Issues related to the specific package managers |
 | 
				
			||||||
| [`wip`](https://github.com/explosion/spaCy/labels/wip) | Work in progress, mostly used for pull requests. |
 | 
					| [`wip`](https://github.com/explosion/spaCy/labels/wip) | Work in progress, mostly used for pull requests |
 | 
				
			||||||
 | 
					| [`v1`](https://github.com/explosion/spaCy/labels/v1) | Reports related to spaCy v1.x |
 | 
				
			||||||
| [`duplicate`](https://github.com/explosion/spaCy/labels/duplicate) | Duplicates, i.e. issues that have been reported before |
 | 
					| [`duplicate`](https://github.com/explosion/spaCy/labels/duplicate) | Duplicates, i.e. issues that have been reported before |
 | 
				
			||||||
 | 
					| [`third-party`](https://github.com/explosion/spaCy/labels/third-party) | Issues related to third-party packages and services |
 | 
				
			||||||
| [`meta`](https://github.com/explosion/spaCy/labels/meta) | Meta topics, e.g. repo organisation and issue management |
 | 
					| [`meta`](https://github.com/explosion/spaCy/labels/meta) | Meta topics, e.g. repo organisation and issue management |
 | 
				
			||||||
| [`help wanted`](https://github.com/explosion/spaCy/labels/help%20wanted), [`help wanted (easy)`](https://github.com/explosion/spaCy/labels/help%20wanted%20%28easy%29) | Requests for contributions |
 | 
					| [`help wanted`](https://github.com/explosion/spaCy/labels/help%20wanted), [`help wanted (easy)`](https://github.com/explosion/spaCy/labels/help%20wanted%20%28easy%29) | Requests for contributions |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -30,7 +30,7 @@ def main(vectors_loc, lang=None):
 | 
				
			||||||
        nlp.vocab.reset_vectors(width=int(nr_dim))
 | 
					        nlp.vocab.reset_vectors(width=int(nr_dim))
 | 
				
			||||||
        for line in file_:
 | 
					        for line in file_:
 | 
				
			||||||
            line = line.decode('utf8')
 | 
					            line = line.decode('utf8')
 | 
				
			||||||
            pieces = line.split()
 | 
					            pieces = line.rsplit(' ', nr_dim)
 | 
				
			||||||
            word = pieces[0]
 | 
					            word = pieces[0]
 | 
				
			||||||
            vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
 | 
					            vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
 | 
				
			||||||
            nlp.vocab.set_vector(word, vector)  # add the vectors to the vocab
 | 
					            nlp.vocab.set_vector(word, vector)  # add the vectors to the vocab
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -5,14 +5,23 @@ from __future__ import unicode_literals
 | 
				
			||||||
# Source: https://github.com/taranjeet/hindi-tokenizer/blob/master/stopwords.txt
 | 
					# Source: https://github.com/taranjeet/hindi-tokenizer/blob/master/stopwords.txt
 | 
				
			||||||
 | 
					
 | 
				
			||||||
STOP_WORDS = set("""
 | 
					STOP_WORDS = set("""
 | 
				
			||||||
 | 
					अंदर
 | 
				
			||||||
अत
 | 
					अत
 | 
				
			||||||
 | 
					अदि
 | 
				
			||||||
 | 
					अप
 | 
				
			||||||
अपना
 | 
					अपना
 | 
				
			||||||
 | 
					अपनि
 | 
				
			||||||
अपनी
 | 
					अपनी
 | 
				
			||||||
अपने
 | 
					अपने
 | 
				
			||||||
 | 
					अभि
 | 
				
			||||||
अभी
 | 
					अभी
 | 
				
			||||||
अंदर
 | 
					अंदर
 | 
				
			||||||
आदि
 | 
					आदि
 | 
				
			||||||
आप
 | 
					आप
 | 
				
			||||||
 | 
					इंहिं
 | 
				
			||||||
 | 
					इंहें
 | 
				
			||||||
 | 
					इंहों
 | 
				
			||||||
 | 
					इतयादि
 | 
				
			||||||
इत्यादि
 | 
					इत्यादि
 | 
				
			||||||
इन
 | 
					इन
 | 
				
			||||||
इनका
 | 
					इनका
 | 
				
			||||||
| 
						 | 
					@ -21,13 +30,19 @@ STOP_WORDS = set("""
 | 
				
			||||||
इन्हों
 | 
					इन्हों
 | 
				
			||||||
इस
 | 
					इस
 | 
				
			||||||
इसका
 | 
					इसका
 | 
				
			||||||
 | 
					इसकि
 | 
				
			||||||
इसकी
 | 
					इसकी
 | 
				
			||||||
इसके
 | 
					इसके
 | 
				
			||||||
इसमें
 | 
					इसमें
 | 
				
			||||||
 | 
					इसि
 | 
				
			||||||
इसी
 | 
					इसी
 | 
				
			||||||
इसे
 | 
					इसे
 | 
				
			||||||
 | 
					उंहिं
 | 
				
			||||||
 | 
					उंहें
 | 
				
			||||||
 | 
					उंहों
 | 
				
			||||||
उन
 | 
					उन
 | 
				
			||||||
उनका
 | 
					उनका
 | 
				
			||||||
 | 
					उनकि
 | 
				
			||||||
उनकी
 | 
					उनकी
 | 
				
			||||||
उनके
 | 
					उनके
 | 
				
			||||||
उनको
 | 
					उनको
 | 
				
			||||||
| 
						 | 
					@ -36,13 +51,17 @@ STOP_WORDS = set("""
 | 
				
			||||||
उन्हों
 | 
					उन्हों
 | 
				
			||||||
उस
 | 
					उस
 | 
				
			||||||
उसके
 | 
					उसके
 | 
				
			||||||
 | 
					उसि
 | 
				
			||||||
उसी
 | 
					उसी
 | 
				
			||||||
उसे
 | 
					उसे
 | 
				
			||||||
एक
 | 
					एक
 | 
				
			||||||
एवं
 | 
					एवं
 | 
				
			||||||
एस
 | 
					एस
 | 
				
			||||||
 | 
					एसे
 | 
				
			||||||
ऐसे
 | 
					ऐसे
 | 
				
			||||||
 | 
					ओर
 | 
				
			||||||
और
 | 
					और
 | 
				
			||||||
 | 
					कइ
 | 
				
			||||||
कई
 | 
					कई
 | 
				
			||||||
कर
 | 
					कर
 | 
				
			||||||
करता
 | 
					करता
 | 
				
			||||||
| 
						 | 
					@ -53,14 +72,18 @@ STOP_WORDS = set("""
 | 
				
			||||||
कहते
 | 
					कहते
 | 
				
			||||||
कहा
 | 
					कहा
 | 
				
			||||||
का
 | 
					का
 | 
				
			||||||
 | 
					काफि
 | 
				
			||||||
काफ़ी
 | 
					काफ़ी
 | 
				
			||||||
कि
 | 
					कि
 | 
				
			||||||
 | 
					किंहें
 | 
				
			||||||
 | 
					किंहों
 | 
				
			||||||
कितना
 | 
					कितना
 | 
				
			||||||
किन्हें
 | 
					किन्हें
 | 
				
			||||||
किन्हों
 | 
					किन्हों
 | 
				
			||||||
किया
 | 
					किया
 | 
				
			||||||
किर
 | 
					किर
 | 
				
			||||||
किस
 | 
					किस
 | 
				
			||||||
 | 
					किसि
 | 
				
			||||||
किसी
 | 
					किसी
 | 
				
			||||||
किसे
 | 
					किसे
 | 
				
			||||||
की
 | 
					की
 | 
				
			||||||
| 
						 | 
					@ -68,27 +91,38 @@ STOP_WORDS = set("""
 | 
				
			||||||
कुल
 | 
					कुल
 | 
				
			||||||
के
 | 
					के
 | 
				
			||||||
को
 | 
					को
 | 
				
			||||||
 | 
					कोइ
 | 
				
			||||||
कोई
 | 
					कोई
 | 
				
			||||||
 | 
					कोन
 | 
				
			||||||
 | 
					कोनसा
 | 
				
			||||||
कौन
 | 
					कौन
 | 
				
			||||||
कौनसा
 | 
					कौनसा
 | 
				
			||||||
गया
 | 
					गया
 | 
				
			||||||
घर
 | 
					घर
 | 
				
			||||||
जब
 | 
					जब
 | 
				
			||||||
जहाँ
 | 
					जहाँ
 | 
				
			||||||
 | 
					जहां
 | 
				
			||||||
जा
 | 
					जा
 | 
				
			||||||
 | 
					जिंहें
 | 
				
			||||||
 | 
					जिंहों
 | 
				
			||||||
जितना
 | 
					जितना
 | 
				
			||||||
 | 
					जिधर
 | 
				
			||||||
जिन
 | 
					जिन
 | 
				
			||||||
जिन्हें
 | 
					जिन्हें
 | 
				
			||||||
जिन्हों
 | 
					जिन्हों
 | 
				
			||||||
जिस
 | 
					जिस
 | 
				
			||||||
जिसे
 | 
					जिसे
 | 
				
			||||||
जीधर
 | 
					जीधर
 | 
				
			||||||
 | 
					जेसा
 | 
				
			||||||
 | 
					जेसे
 | 
				
			||||||
जैसा
 | 
					जैसा
 | 
				
			||||||
जैसे
 | 
					जैसे
 | 
				
			||||||
जो
 | 
					जो
 | 
				
			||||||
तक
 | 
					तक
 | 
				
			||||||
तब
 | 
					तब
 | 
				
			||||||
तरह
 | 
					तरह
 | 
				
			||||||
 | 
					तिंहें
 | 
				
			||||||
 | 
					तिंहों
 | 
				
			||||||
तिन
 | 
					तिन
 | 
				
			||||||
तिन्हें
 | 
					तिन्हें
 | 
				
			||||||
तिन्हों
 | 
					तिन्हों
 | 
				
			||||||
| 
						 | 
					@ -96,32 +130,41 @@ STOP_WORDS = set("""
 | 
				
			||||||
तिसे
 | 
					तिसे
 | 
				
			||||||
तो
 | 
					तो
 | 
				
			||||||
था
 | 
					था
 | 
				
			||||||
 | 
					थि
 | 
				
			||||||
थी
 | 
					थी
 | 
				
			||||||
थे
 | 
					थे
 | 
				
			||||||
दबारा
 | 
					दबारा
 | 
				
			||||||
 | 
					दवारा
 | 
				
			||||||
दिया
 | 
					दिया
 | 
				
			||||||
दुसरा
 | 
					दुसरा
 | 
				
			||||||
 | 
					दुसरे
 | 
				
			||||||
दूसरे
 | 
					दूसरे
 | 
				
			||||||
दो
 | 
					दो
 | 
				
			||||||
द्वारा
 | 
					द्वारा
 | 
				
			||||||
न
 | 
					न
 | 
				
			||||||
नके
 | 
					नहिं
 | 
				
			||||||
नहीं
 | 
					नहीं
 | 
				
			||||||
ना
 | 
					ना
 | 
				
			||||||
 | 
					निचे
 | 
				
			||||||
निहायत
 | 
					निहायत
 | 
				
			||||||
नीचे
 | 
					नीचे
 | 
				
			||||||
ने
 | 
					ने
 | 
				
			||||||
पर
 | 
					पर
 | 
				
			||||||
पहले
 | 
					पहले
 | 
				
			||||||
 | 
					पुरा
 | 
				
			||||||
पूरा
 | 
					पूरा
 | 
				
			||||||
पे
 | 
					पे
 | 
				
			||||||
फिर
 | 
					फिर
 | 
				
			||||||
 | 
					बनि
 | 
				
			||||||
बनी
 | 
					बनी
 | 
				
			||||||
 | 
					बहि
 | 
				
			||||||
बही
 | 
					बही
 | 
				
			||||||
बहुत
 | 
					बहुत
 | 
				
			||||||
बाद
 | 
					बाद
 | 
				
			||||||
बाला
 | 
					बाला
 | 
				
			||||||
बिलकुल
 | 
					बिलकुल
 | 
				
			||||||
 | 
					भि
 | 
				
			||||||
 | 
					भितर
 | 
				
			||||||
भी
 | 
					भी
 | 
				
			||||||
भीतर
 | 
					भीतर
 | 
				
			||||||
मगर
 | 
					मगर
 | 
				
			||||||
| 
						 | 
					@ -131,11 +174,14 @@ STOP_WORDS = set("""
 | 
				
			||||||
यदि
 | 
					यदि
 | 
				
			||||||
यह
 | 
					यह
 | 
				
			||||||
यहाँ
 | 
					यहाँ
 | 
				
			||||||
 | 
					यहां
 | 
				
			||||||
 | 
					यहि
 | 
				
			||||||
यही
 | 
					यही
 | 
				
			||||||
या
 | 
					या
 | 
				
			||||||
यिह
 | 
					यिह
 | 
				
			||||||
ये
 | 
					ये
 | 
				
			||||||
रखें
 | 
					रखें
 | 
				
			||||||
 | 
					रवासा
 | 
				
			||||||
रहा
 | 
					रहा
 | 
				
			||||||
रहे
 | 
					रहे
 | 
				
			||||||
ऱ्वासा
 | 
					ऱ्वासा
 | 
				
			||||||
| 
						 | 
					@ -143,17 +189,24 @@ STOP_WORDS = set("""
 | 
				
			||||||
लिये
 | 
					लिये
 | 
				
			||||||
लेकिन
 | 
					लेकिन
 | 
				
			||||||
व
 | 
					व
 | 
				
			||||||
 | 
					वगेरह
 | 
				
			||||||
वग़ैरह
 | 
					वग़ैरह
 | 
				
			||||||
 | 
					वरग
 | 
				
			||||||
वर्ग
 | 
					वर्ग
 | 
				
			||||||
वह
 | 
					वह
 | 
				
			||||||
वहाँ
 | 
					वहाँ
 | 
				
			||||||
 | 
					वहां
 | 
				
			||||||
 | 
					वहिं
 | 
				
			||||||
वहीं
 | 
					वहीं
 | 
				
			||||||
वाले
 | 
					वाले
 | 
				
			||||||
वुह
 | 
					वुह
 | 
				
			||||||
वे
 | 
					वे
 | 
				
			||||||
 | 
					वग़ैरह
 | 
				
			||||||
 | 
					संग
 | 
				
			||||||
सकता
 | 
					सकता
 | 
				
			||||||
सकते
 | 
					सकते
 | 
				
			||||||
सबसे
 | 
					सबसे
 | 
				
			||||||
 | 
					सभि
 | 
				
			||||||
सभी
 | 
					सभी
 | 
				
			||||||
साथ
 | 
					साथ
 | 
				
			||||||
साबुत
 | 
					साबुत
 | 
				
			||||||
| 
						 | 
					@ -162,16 +215,23 @@ STOP_WORDS = set("""
 | 
				
			||||||
से
 | 
					से
 | 
				
			||||||
सो
 | 
					सो
 | 
				
			||||||
संग
 | 
					संग
 | 
				
			||||||
 | 
					हि
 | 
				
			||||||
ही
 | 
					ही
 | 
				
			||||||
 | 
					हुअ
 | 
				
			||||||
हुआ
 | 
					हुआ
 | 
				
			||||||
 | 
					हुइ
 | 
				
			||||||
हुई
 | 
					हुई
 | 
				
			||||||
हुए
 | 
					हुए
 | 
				
			||||||
 | 
					हे
 | 
				
			||||||
 | 
					हें
 | 
				
			||||||
है
 | 
					है
 | 
				
			||||||
हैं
 | 
					हैं
 | 
				
			||||||
हो
 | 
					हो
 | 
				
			||||||
होता
 | 
					होता
 | 
				
			||||||
 | 
					होति
 | 
				
			||||||
होती
 | 
					होती
 | 
				
			||||||
होते
 | 
					होते
 | 
				
			||||||
होना
 | 
					होना
 | 
				
			||||||
होने
 | 
					होने
 | 
				
			||||||
 | 
					
 | 
				
			||||||
""".split())
 | 
					""".split())
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -82,7 +82,7 @@
 | 
				
			||||||
            }
 | 
					            }
 | 
				
			||||||
        ],
 | 
					        ],
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        "V_CSS": "2.0.0",
 | 
					        "V_CSS": "2.0.1",
 | 
				
			||||||
        "V_JS": "2.0.1",
 | 
					        "V_JS": "2.0.1",
 | 
				
			||||||
        "DEFAULT_SYNTAX": "python",
 | 
					        "DEFAULT_SYNTAX": "python",
 | 
				
			||||||
        "ANALYTICS": "UA-58931649-1",
 | 
					        "ANALYTICS": "UA-58931649-1",
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -312,6 +312,14 @@ mixin github(repo, file, height, alt_file, language)
 | 
				
			||||||
                +button(gh(repo, alt_file || file), false, "primary", "small") View on GitHub
 | 
					                +button(gh(repo, alt_file || file), false, "primary", "small") View on GitHub
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					//- Youtube video embed
 | 
				
			||||||
 | 
					    id    - [string] ID of YouTube video.
 | 
				
			||||||
 | 
					    ratio - [string] Video ratio, "16x9" or "4x3".
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					mixin youtube(id, ratio)
 | 
				
			||||||
 | 
					    figure.o-video.o-block(class="o-video--" + (ratio || "16x9"))
 | 
				
			||||||
 | 
					        iframe.o-video__iframe(src="https://www.youtube.com/embed/#{id}" frameborder="0" height="500" allowfullscreen)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
//- Images / figures
 | 
					//- Images / figures
 | 
				
			||||||
    url     - [string] url or path to image
 | 
					    url     - [string] url or path to image
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -177,6 +177,22 @@
 | 
				
			||||||
    border-radius: $border-radius
 | 
					    border-radius: $border-radius
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					//- Responsive Video embeds
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.o-video
 | 
				
			||||||
 | 
					    position: relative
 | 
				
			||||||
 | 
					    height: 0
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @each $ratio1, $ratio2 in (16, 9), (4, 3)
 | 
				
			||||||
 | 
					        &.o-video--#{$ratio1}x#{$ratio2}
 | 
				
			||||||
 | 
					            padding-bottom: (100% * $ratio2 / $ratio1)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.o-video__iframe
 | 
				
			||||||
 | 
					    @include position(absolute, top, left, 0, 0)
 | 
				
			||||||
 | 
					    @include size(100%)
 | 
				
			||||||
 | 
					    border-radius: var(--border-radius)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
//- Form fields
 | 
					//- Form fields
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.o-field
 | 
					.o-field
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -376,7 +376,7 @@ p
 | 
				
			||||||
 | 
					
 | 
				
			||||||
p
 | 
					p
 | 
				
			||||||
    |  Here's an example from the English
 | 
					    |  Here's an example from the English
 | 
				
			||||||
    |  #[+src(gh("spaCy", "spacy/en/lang/lex_attrs.py")) #[code lex_attrs.py]]:
 | 
					    |  #[+src(gh("spaCy", "spacy/lang/en/lex_attrs.py")) #[code lex_attrs.py]]:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
+code("lex_attrs.py").
 | 
					+code("lex_attrs.py").
 | 
				
			||||||
    _num_words = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
 | 
					    _num_words = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -166,6 +166,7 @@
 | 
				
			||||||
            "Demos & Visualizations": "demos",
 | 
					            "Demos & Visualizations": "demos",
 | 
				
			||||||
            "Books & Courses": "books",
 | 
					            "Books & Courses": "books",
 | 
				
			||||||
            "Jupyter Notebooks": "notebooks",
 | 
					            "Jupyter Notebooks": "notebooks",
 | 
				
			||||||
 | 
					            "Videos": "videos",
 | 
				
			||||||
            "Research": "research"
 | 
					            "Research": "research"
 | 
				
			||||||
        }
 | 
					        }
 | 
				
			||||||
    },
 | 
					    },
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -184,7 +184,8 @@ p
 | 
				
			||||||
+h(4, "source-windows") Windows
 | 
					+h(4, "source-windows") Windows
 | 
				
			||||||
 | 
					
 | 
				
			||||||
p
 | 
					p
 | 
				
			||||||
    |  Install a version of
 | 
					    |  Install a version of the
 | 
				
			||||||
 | 
					    |  #[+a("http://landinghub.visualstudio.com/visual-cpp-build-tools") Visual C++ Bulild Tools] or
 | 
				
			||||||
    |  #[+a("https://www.visualstudio.com/vs/visual-studio-express/") Visual Studio Express]
 | 
					    |  #[+a("https://www.visualstudio.com/vs/visual-studio-express/") Visual Studio Express]
 | 
				
			||||||
    |  that matches the version that was used to compile your Python
 | 
					    |  that matches the version that was used to compile your Python
 | 
				
			||||||
    |  interpreter. For official distributions these are:
 | 
					    |  interpreter. For official distributions these are:
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -39,7 +39,7 @@ p
 | 
				
			||||||
        return doc
 | 
					        return doc
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    nlp = spacy.load('en')
 | 
					    nlp = spacy.load('en')
 | 
				
			||||||
    nlp.pipeline.add_pipe(my_component, name='print_info', first=True)
 | 
					    nlp.add_pipe(my_component, name='print_info', first=True)
 | 
				
			||||||
    print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner']
 | 
					    print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner']
 | 
				
			||||||
    doc = nlp(u"This is a sentence.")
 | 
					    doc = nlp(u"This is a sentence.")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -55,6 +55,6 @@ p
 | 
				
			||||||
p
 | 
					p
 | 
				
			||||||
    |  While punctuation rules are usually pretty general, tokenizer exceptions
 | 
					    |  While punctuation rules are usually pretty general, tokenizer exceptions
 | 
				
			||||||
    |  strongly depend on the specifics of the individual language. This is
 | 
					    |  strongly depend on the specifics of the individual language. This is
 | 
				
			||||||
    |  why each #[+a("/models/#languages") available language] has its
 | 
					    |  why each #[+a("/usage/models#languages") available language] has its
 | 
				
			||||||
    |  own subclass like #[code English] or #[code German], that loads in lists
 | 
					    |  own subclass like #[code English] or #[code German], that loads in lists
 | 
				
			||||||
    |  of hard-coded data and exception rules.
 | 
					    |  of hard-coded data and exception rules.
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -114,6 +114,11 @@ include ../_includes/_mixins
 | 
				
			||||||
    .u-text-right
 | 
					    .u-text-right
 | 
				
			||||||
        +button(gh("spacy-notebooks"), false, "primary", "small") See more notebooks on GitHub
 | 
					        +button(gh("spacy-notebooks"), false, "primary", "small") See more notebooks on GitHub
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					+section("videos")
 | 
				
			||||||
 | 
					    +h(2, "videos") Videos
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    +youtube("sqDHBH9IjRU")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
+section("research")
 | 
					+section("research")
 | 
				
			||||||
    +h(2, "research") Research systems
 | 
					    +h(2, "research") Research systems
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user