mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
		
						commit
						4895b2e830
					
				
							
								
								
									
										106
									
								
								.github/contributors/ALSchwalm.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/ALSchwalm.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							|  | @ -0,0 +1,106 @@ | |||
| # spaCy contributor agreement | ||||
| 
 | ||||
| This spaCy Contributor Agreement (**"SCA"**) is based on the | ||||
| [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). | ||||
| The SCA applies to any contribution that you make to any product or project | ||||
| managed by us (the **"project"**), and sets out the intellectual property rights | ||||
| you grant to us in the contributed materials. The term **"us"** shall mean | ||||
| [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term | ||||
| **"you"** shall mean the person or entity identified below. | ||||
| 
 | ||||
| If you agree to be bound by these terms, fill in the information requested | ||||
| below and include the filled-in version with your first pull request, under the | ||||
| folder [`.github/contributors/`](/.github/contributors/). The name of the file | ||||
| should be your GitHub username, with the extension `.md`. For example, the user | ||||
| example_user would create the file `.github/contributors/example_user.md`. | ||||
| 
 | ||||
| Read this agreement carefully before signing. These terms and conditions | ||||
| constitute a binding legal agreement. | ||||
| 
 | ||||
| ## Contributor Agreement | ||||
| 
 | ||||
| 1. The term "contribution" or "contributed materials" means any source code, | ||||
| object code, patch, tool, sample, graphic, specification, manual, | ||||
| documentation, or any other material posted or submitted by you to the project. | ||||
| 
 | ||||
| 2. With respect to any worldwide copyrights, or copyright applications and | ||||
| registrations, in your contribution: | ||||
| 
 | ||||
|     * you hereby assign to us joint ownership, and to the extent that such | ||||
|     assignment is or becomes invalid, ineffective or unenforceable, you hereby | ||||
|     grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, | ||||
|     royalty-free, unrestricted license to exercise all rights under those | ||||
|     copyrights. This includes, at our option, the right to sublicense these same | ||||
|     rights to third parties through multiple levels of sublicensees or other | ||||
|     licensing arrangements; | ||||
| 
 | ||||
|     * you agree that each of us can do all things in relation to your | ||||
|     contribution as if each of us were the sole owners, and if one of us makes | ||||
|     a derivative work of your contribution, the one who makes the derivative | ||||
|     work (or has it made will be the sole owner of that derivative work; | ||||
| 
 | ||||
|     * you agree that you will not assert any moral rights in your contribution | ||||
|     against us, our licensees or transferees; | ||||
| 
 | ||||
|     * you agree that we may register a copyright in your contribution and | ||||
|     exercise all ownership rights associated with it; and | ||||
| 
 | ||||
|     * you agree that neither of us has any duty to consult with, obtain the | ||||
|     consent of, pay or render an accounting to the other for any use or | ||||
|     distribution of your contribution. | ||||
| 
 | ||||
| 3. With respect to any patents you own, or that you can license without payment | ||||
| to any third party, you hereby grant to us a perpetual, irrevocable, | ||||
| non-exclusive, worldwide, no-charge, royalty-free license to: | ||||
| 
 | ||||
|     * make, have made, use, sell, offer to sell, import, and otherwise transfer | ||||
|     your contribution in whole or in part, alone or in combination with or | ||||
|     included in any product, work or materials arising out of the project to | ||||
|     which your contribution was submitted, and | ||||
| 
 | ||||
|     * at our option, to sublicense these same rights to third parties through | ||||
|     multiple levels of sublicensees or other licensing arrangements. | ||||
| 
 | ||||
| 4. Except as set out above, you keep all right, title, and interest in your | ||||
| contribution. The rights that you grant to us under these terms are effective | ||||
| on the date you first submitted a contribution to us, even if your submission | ||||
| took place before the date you sign these terms. | ||||
| 
 | ||||
| 5. You covenant, represent, warrant and agree that: | ||||
| 
 | ||||
|     * Each contribution that you submit is and shall be an original work of | ||||
|     authorship and you can legally grant the rights set out in this SCA; | ||||
| 
 | ||||
|     * to the best of your knowledge, each contribution will not violate any | ||||
|     third party's copyrights, trademarks, patents, or other intellectual | ||||
|     property rights; and | ||||
| 
 | ||||
|     * each contribution shall be in compliance with U.S. export control laws and | ||||
|     other applicable export and import laws. You agree to notify us if you | ||||
|     become aware of any circumstance which would make any of the foregoing | ||||
|     representations inaccurate in any respect. We may publicly disclose your | ||||
|     participation in the project, including the fact that you have signed the SCA. | ||||
| 
 | ||||
| 6. This SCA is governed by the laws of the State of California and applicable | ||||
| U.S. Federal law. Any choice of law rules will not apply. | ||||
| 
 | ||||
| 7. Please place an “x” on one of the applicable statement below. Please do NOT | ||||
| mark both statements: | ||||
| 
 | ||||
|     * [x] I am signing on behalf of myself as an individual and no other person | ||||
|     or entity, including my employer, has or will have rights with respect to my | ||||
|     contributions. | ||||
| 
 | ||||
|     * [ ] I am signing on behalf of my employer or a legal entity and I have the | ||||
|     actual authority to contractually bind that entity. | ||||
| 
 | ||||
| ## Contributor Details | ||||
| 
 | ||||
| | Field                          | Entry                    | | ||||
| |------------------------------- | ------------------------ | | ||||
| | Name                           | Adam Schwalm             | | ||||
| | Company name (if applicable)   | Star Lab                 | | ||||
| | Title or role (if applicable)  | Software Engineer        | | ||||
| | Date                           | 2018-11-28               | | ||||
| | GitHub username                | ALSchwalm                | | ||||
| | Website (optional)             | https://alschwalm.com    | | ||||
							
								
								
									
										106
									
								
								.github/contributors/svlandeg.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/svlandeg.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							|  | @ -0,0 +1,106 @@ | |||
| # spaCy contributor agreement | ||||
| 
 | ||||
| This spaCy Contributor Agreement (**"SCA"**) is based on the | ||||
| [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). | ||||
| The SCA applies to any contribution that you make to any product or project | ||||
| managed by us (the **"project"**), and sets out the intellectual property rights | ||||
| you grant to us in the contributed materials. The term **"us"** shall mean | ||||
| [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term | ||||
| **"you"** shall mean the person or entity identified below. | ||||
| 
 | ||||
| If you agree to be bound by these terms, fill in the information requested | ||||
| below and include the filled-in version with your first pull request, under the | ||||
| folder [`.github/contributors/`](/.github/contributors/). The name of the file | ||||
| should be your GitHub username, with the extension `.md`. For example, the user | ||||
| example_user would create the file `.github/contributors/example_user.md`. | ||||
| 
 | ||||
| Read this agreement carefully before signing. These terms and conditions | ||||
| constitute a binding legal agreement. | ||||
| 
 | ||||
| ## Contributor Agreement | ||||
| 
 | ||||
| 1. The term "contribution" or "contributed materials" means any source code, | ||||
| object code, patch, tool, sample, graphic, specification, manual, | ||||
| documentation, or any other material posted or submitted by you to the project. | ||||
| 
 | ||||
| 2. With respect to any worldwide copyrights, or copyright applications and | ||||
| registrations, in your contribution: | ||||
| 
 | ||||
|     * you hereby assign to us joint ownership, and to the extent that such | ||||
|     assignment is or becomes invalid, ineffective or unenforceable, you hereby | ||||
|     grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, | ||||
|     royalty-free, unrestricted license to exercise all rights under those | ||||
|     copyrights. This includes, at our option, the right to sublicense these same | ||||
|     rights to third parties through multiple levels of sublicensees or other | ||||
|     licensing arrangements; | ||||
| 
 | ||||
|     * you agree that each of us can do all things in relation to your | ||||
|     contribution as if each of us were the sole owners, and if one of us makes | ||||
|     a derivative work of your contribution, the one who makes the derivative | ||||
|     work (or has it made will be the sole owner of that derivative work; | ||||
| 
 | ||||
|     * you agree that you will not assert any moral rights in your contribution | ||||
|     against us, our licensees or transferees; | ||||
| 
 | ||||
|     * you agree that we may register a copyright in your contribution and | ||||
|     exercise all ownership rights associated with it; and | ||||
| 
 | ||||
|     * you agree that neither of us has any duty to consult with, obtain the | ||||
|     consent of, pay or render an accounting to the other for any use or | ||||
|     distribution of your contribution. | ||||
| 
 | ||||
| 3. With respect to any patents you own, or that you can license without payment | ||||
| to any third party, you hereby grant to us a perpetual, irrevocable, | ||||
| non-exclusive, worldwide, no-charge, royalty-free license to: | ||||
| 
 | ||||
|     * make, have made, use, sell, offer to sell, import, and otherwise transfer | ||||
|     your contribution in whole or in part, alone or in combination with or | ||||
|     included in any product, work or materials arising out of the project to | ||||
|     which your contribution was submitted, and | ||||
| 
 | ||||
|     * at our option, to sublicense these same rights to third parties through | ||||
|     multiple levels of sublicensees or other licensing arrangements. | ||||
| 
 | ||||
| 4. Except as set out above, you keep all right, title, and interest in your | ||||
| contribution. The rights that you grant to us under these terms are effective | ||||
| on the date you first submitted a contribution to us, even if your submission | ||||
| took place before the date you sign these terms. | ||||
| 
 | ||||
| 5. You covenant, represent, warrant and agree that: | ||||
| 
 | ||||
|     * Each contribution that you submit is and shall be an original work of | ||||
|     authorship and you can legally grant the rights set out in this SCA; | ||||
| 
 | ||||
|     * to the best of your knowledge, each contribution will not violate any | ||||
|     third party's copyrights, trademarks, patents, or other intellectual | ||||
|     property rights; and | ||||
| 
 | ||||
|     * each contribution shall be in compliance with U.S. export control laws and | ||||
|     other applicable export and import laws. You agree to notify us if you | ||||
|     become aware of any circumstance which would make any of the foregoing | ||||
|     representations inaccurate in any respect. We may publicly disclose your | ||||
|     participation in the project, including the fact that you have signed the SCA. | ||||
| 
 | ||||
| 6. This SCA is governed by the laws of the State of California and applicable | ||||
| U.S. Federal law. Any choice of law rules will not apply. | ||||
| 
 | ||||
| 7. Please place an “x” on one of the applicable statement below. Please do NOT | ||||
| mark both statements: | ||||
| 
 | ||||
|     * [x] I am signing on behalf of myself as an individual and no other person | ||||
|     or entity, including my employer, has or will have rights with respect to my | ||||
|     contributions. | ||||
| 
 | ||||
|     * [ ] I am signing on behalf of my employer or a legal entity and I have the | ||||
|     actual authority to contractually bind that entity. | ||||
| 
 | ||||
| ## Contributor Details | ||||
| 
 | ||||
| | Field                          | Entry                | | ||||
| |------------------------------- | -------------------- | | ||||
| | Name                           | Sofie Van Landeghem  | | ||||
| | Company name (if applicable)   |                      | | ||||
| | Title or role (if applicable)  |                      | | ||||
| | Date                           | 29 Nov 2018          | | ||||
| | GitHub username                | svlandeg             | | ||||
| | Website (optional)             |                      | | ||||
							
								
								
									
										106
									
								
								.github/contributors/wxv.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										106
									
								
								.github/contributors/wxv.md
									
									
									
									
										vendored
									
									
										Normal file
									
								
							|  | @ -0,0 +1,106 @@ | |||
| # spaCy contributor agreement | ||||
| 
 | ||||
| This spaCy Contributor Agreement (**"SCA"**) is based on the | ||||
| [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). | ||||
| The SCA applies to any contribution that you make to any product or project | ||||
| managed by us (the **"project"**), and sets out the intellectual property rights | ||||
| you grant to us in the contributed materials. The term **"us"** shall mean | ||||
| [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term | ||||
| **"you"** shall mean the person or entity identified below. | ||||
| 
 | ||||
| If you agree to be bound by these terms, fill in the information requested | ||||
| below and include the filled-in version with your first pull request, under the | ||||
| folder [`.github/contributors/`](/.github/contributors/). The name of the file | ||||
| should be your GitHub username, with the extension `.md`. For example, the user | ||||
| example_user would create the file `.github/contributors/example_user.md`. | ||||
| 
 | ||||
| Read this agreement carefully before signing. These terms and conditions | ||||
| constitute a binding legal agreement. | ||||
| 
 | ||||
| ## Contributor Agreement | ||||
| 
 | ||||
| 1. The term "contribution" or "contributed materials" means any source code, | ||||
| object code, patch, tool, sample, graphic, specification, manual, | ||||
| documentation, or any other material posted or submitted by you to the project. | ||||
| 
 | ||||
| 2. With respect to any worldwide copyrights, or copyright applications and | ||||
| registrations, in your contribution: | ||||
| 
 | ||||
|     * you hereby assign to us joint ownership, and to the extent that such | ||||
|     assignment is or becomes invalid, ineffective or unenforceable, you hereby | ||||
|     grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, | ||||
|     royalty-free, unrestricted license to exercise all rights under those | ||||
|     copyrights. This includes, at our option, the right to sublicense these same | ||||
|     rights to third parties through multiple levels of sublicensees or other | ||||
|     licensing arrangements; | ||||
| 
 | ||||
|     * you agree that each of us can do all things in relation to your | ||||
|     contribution as if each of us were the sole owners, and if one of us makes | ||||
|     a derivative work of your contribution, the one who makes the derivative | ||||
|     work (or has it made will be the sole owner of that derivative work; | ||||
| 
 | ||||
|     * you agree that you will not assert any moral rights in your contribution | ||||
|     against us, our licensees or transferees; | ||||
| 
 | ||||
|     * you agree that we may register a copyright in your contribution and | ||||
|     exercise all ownership rights associated with it; and | ||||
| 
 | ||||
|     * you agree that neither of us has any duty to consult with, obtain the | ||||
|     consent of, pay or render an accounting to the other for any use or | ||||
|     distribution of your contribution. | ||||
| 
 | ||||
| 3. With respect to any patents you own, or that you can license without payment | ||||
| to any third party, you hereby grant to us a perpetual, irrevocable, | ||||
| non-exclusive, worldwide, no-charge, royalty-free license to: | ||||
| 
 | ||||
|     * make, have made, use, sell, offer to sell, import, and otherwise transfer | ||||
|     your contribution in whole or in part, alone or in combination with or | ||||
|     included in any product, work or materials arising out of the project to | ||||
|     which your contribution was submitted, and | ||||
| 
 | ||||
|     * at our option, to sublicense these same rights to third parties through | ||||
|     multiple levels of sublicensees or other licensing arrangements. | ||||
| 
 | ||||
| 4. Except as set out above, you keep all right, title, and interest in your | ||||
| contribution. The rights that you grant to us under these terms are effective | ||||
| on the date you first submitted a contribution to us, even if your submission | ||||
| took place before the date you sign these terms. | ||||
| 
 | ||||
| 5. You covenant, represent, warrant and agree that: | ||||
| 
 | ||||
|     * Each contribution that you submit is and shall be an original work of | ||||
|     authorship and you can legally grant the rights set out in this SCA; | ||||
| 
 | ||||
|     * to the best of your knowledge, each contribution will not violate any | ||||
|     third party's copyrights, trademarks, patents, or other intellectual | ||||
|     property rights; and | ||||
| 
 | ||||
|     * each contribution shall be in compliance with U.S. export control laws and | ||||
|     other applicable export and import laws. You agree to notify us if you | ||||
|     become aware of any circumstance which would make any of the foregoing | ||||
|     representations inaccurate in any respect. We may publicly disclose your | ||||
|     participation in the project, including the fact that you have signed the SCA. | ||||
| 
 | ||||
| 6. This SCA is governed by the laws of the State of California and applicable | ||||
| U.S. Federal law. Any choice of law rules will not apply. | ||||
| 
 | ||||
| 7. Please place an “x” on one of the applicable statement below. Please do NOT | ||||
| mark both statements: | ||||
| 
 | ||||
|     * [x] I am signing on behalf of myself as an individual and no other person | ||||
|     or entity, including my employer, has or will have rights with respect to my | ||||
|     contributions. | ||||
| 
 | ||||
|     * [ ] I am signing on behalf of my employer or a legal entity and I have the | ||||
|     actual authority to contractually bind that entity. | ||||
| 
 | ||||
| ## Contributor Details | ||||
| 
 | ||||
| | Field                          | Entry                | | ||||
| |------------------------------- | -------------------- | | ||||
| | Name                           | Jason Xu             | | ||||
| | Company name (if applicable)   |                      | | ||||
| | Title or role (if applicable)  |                      | | ||||
| | Date                           | 2018-11-29           | | ||||
| | GitHub username                | wxv                  | | ||||
| | Website (optional)             |                      | | ||||
|  | @ -10,7 +10,7 @@ the **fastest syntactic parser** in the world, convolutional **neural network mo | |||
| for tagging, parsing and **named entity recognition** and easy **deep learning** | ||||
| integration. It's commercial open-source software, released under the MIT license. | ||||
| 
 | ||||
| 💫 **Version 2.0 out now!** `Check out the new features here. <https://spacy.io/usage/v2>`_ | ||||
| 💫 **Version 2.0 out now!** `Check out the release notes here. <https://github.com/explosion/spaCy/releases>`_ | ||||
| 
 | ||||
| .. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square&logo=travis | ||||
|     :target: https://travis-ci.org/explosion/spaCy | ||||
|  | @ -88,7 +88,7 @@ Features | |||
| * **Fastest syntactic parser** in the world | ||||
| * **Named entity** recognition | ||||
| * Non-destructive **tokenization** | ||||
| * Support for **20+ languages** | ||||
| * Support for **30+ languages** | ||||
| * Pre-trained `statistical models <https://spacy.io/models>`_ and word vectors | ||||
| * Easy **deep learning** integration | ||||
| * Part-of-speech tagging | ||||
|  | @ -200,11 +200,6 @@ or manually by pointing pip to a path or URL. | |||
|     # pip install .tar.gz archive from path or URL | ||||
|     pip install /Users/you/en_core_web_sm-2.0.0.tar.gz | ||||
| 
 | ||||
| If you have SSL certification problems, SSL customization options are described in the help: | ||||
| 
 | ||||
|     # help for the download command | ||||
|     python -m spacy download --help | ||||
| 
 | ||||
| Loading and using models | ||||
| ------------------------ | ||||
| 
 | ||||
|  |  | |||
|  | @ -7,8 +7,8 @@ murmurhash>=0.28.0,<1.1.0 | |||
| plac<1.0.0,>=0.9.6 | ||||
| ujson>=1.35 | ||||
| dill>=0.2,<0.3 | ||||
| regex>=2017.4.5,<2017.12.1 | ||||
| regex==2018.01.10 | ||||
| requests>=2.13.0,<3.0.0 | ||||
| pytest>=3.6.0,<4.0.0 | ||||
| pytest>=4.0.0,<5.0.0 | ||||
| mock>=2.0.0,<3.0.0 | ||||
| pathlib==1.0.1; python_version < "3.4" | ||||
|  |  | |||
							
								
								
									
										2
									
								
								setup.py
									
									
									
									
									
								
							
							
						
						
									
										2
									
								
								setup.py
									
									
									
									
									
								
							|  | @ -200,7 +200,7 @@ def setup_package(): | |||
|                 'plac<1.0.0,>=0.9.6', | ||||
|                 'ujson>=1.35', | ||||
|                 'dill>=0.2,<0.3', | ||||
|                 'regex>=2017.4.5,<2017.12.1', | ||||
|                 'regex==2018.01.10', | ||||
|                 'requests>=2.13.0,<3.0.0', | ||||
|                 'pathlib==1.0.1; python_version < "3.4"'], | ||||
|             extras_require={ | ||||
|  |  | |||
|  | @ -141,7 +141,7 @@ _regular_exp += ["^{prefix}[{hyphen}][{alpha}][{alpha}{elision}{other_hyphen}\-] | |||
|                  elision=ELISION, alpha=ALPHA_LOWER) | ||||
|                  for p in _hyphen_prefix] | ||||
| _regular_exp += ["^{prefix}[{elision}][{alpha}][{alpha}{elision}{hyphen}\-]*$".format( | ||||
|                  prefix=p, elision=HYPHENS, hyphen=_other_hyphens, alpha=ALPHA_LOWER) | ||||
|                  prefix=p, elision=ELISION, hyphen=_other_hyphens, alpha=ALPHA_LOWER) | ||||
|                  for p in _elision_prefix] | ||||
| _regular_exp.append(URL_PATTERN) | ||||
| 
 | ||||
|  |  | |||
|  | @ -33,7 +33,6 @@ def test_de_tokenizer_norm_exceptions(de_tokenizer, text, norms): | |||
|     assert [token.norm_ for token in tokens] == norms | ||||
| 
 | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| @pytest.mark.parametrize('text,norm', [("daß", "dass")]) | ||||
| def test_de_lex_attrs_norm_exceptions(de_tokenizer, text, norm): | ||||
|     tokens = de_tokenizer(text) | ||||
|  |  | |||
|  | @ -61,7 +61,7 @@ def test_en_sbd_serialization_projective(EN): | |||
| 
 | ||||
| 
 | ||||
| TEST_CASES = [ | ||||
|     pytest.mark.xfail(("Hello World. My name is Jonas.", ["Hello World.", "My name is Jonas."])), | ||||
|     pytest.param("Hello World. My name is Jonas.", ["Hello World.", "My name is Jonas."], marks=pytest.mark.xfail()), | ||||
|     ("What is your name? My name is Jonas.", ["What is your name?", "My name is Jonas."]), | ||||
|     ("There it is! I found it.", ["There it is!", "I found it."]), | ||||
|     ("My name is Jonas E. Smith.", ["My name is Jonas E. Smith."]), | ||||
|  | @ -71,48 +71,48 @@ TEST_CASES = [ | |||
|     ("Let's ask Jane and co. They should know.", ["Let's ask Jane and co.", "They should know."]), | ||||
|     ("They closed the deal with Pitt, Briggs & Co. It closed yesterday.", ["They closed the deal with Pitt, Briggs & Co.", "It closed yesterday."]), | ||||
|     ("I can see Mt. Fuji from here.", ["I can see Mt. Fuji from here."]), | ||||
|     pytest.mark.xfail(("St. Michael's Church is on 5th st. near the light.", ["St. Michael's Church is on 5th st. near the light."])), | ||||
|     pytest.param("St. Michael's Church is on 5th st. near the light.", ["St. Michael's Church is on 5th st. near the light."], marks=pytest.mark.xfail()), | ||||
|     ("That is JFK Jr.'s book.", ["That is JFK Jr.'s book."]), | ||||
|     ("I visited the U.S.A. last year.", ["I visited the U.S.A. last year."]), | ||||
|     ("I live in the E.U. How about you?", ["I live in the E.U.", "How about you?"]), | ||||
|     ("I live in the U.S. How about you?", ["I live in the U.S.", "How about you?"]), | ||||
|     ("I work for the U.S. Government in Virginia.", ["I work for the U.S. Government in Virginia."]), | ||||
|     ("I have lived in the U.S. for 20 years.", ["I have lived in the U.S. for 20 years."]), | ||||
|     pytest.mark.xfail(("At 5 a.m. Mr. Smith went to the bank. He left the bank at 6 P.M. Mr. Smith then went to the store.", ["At 5 a.m. Mr. Smith went to the bank.", "He left the bank at 6 P.M.", "Mr. Smith then went to the store."])), | ||||
|     pytest.param("At 5 a.m. Mr. Smith went to the bank. He left the bank at 6 P.M. Mr. Smith then went to the store.", ["At 5 a.m. Mr. Smith went to the bank.", "He left the bank at 6 P.M.", "Mr. Smith then went to the store."], marks=pytest.mark.xfail()), | ||||
|     ("She has $100.00 in her bag.", ["She has $100.00 in her bag."]), | ||||
|     ("She has $100.00. It is in her bag.", ["She has $100.00.", "It is in her bag."]), | ||||
|     ("He teaches science (He previously worked for 5 years as an engineer.) at the local University.", ["He teaches science (He previously worked for 5 years as an engineer.) at the local University."]), | ||||
|     ("Her email is Jane.Doe@example.com. I sent her an email.", ["Her email is Jane.Doe@example.com.", "I sent her an email."]), | ||||
|     ("The site is: https://www.example.50.com/new-site/awesome_content.html. Please check it out.", ["The site is: https://www.example.50.com/new-site/awesome_content.html.", "Please check it out."]), | ||||
|     pytest.mark.xfail(("She turned to him, 'This is great.' she said.", ["She turned to him, 'This is great.' she said."])), | ||||
|     pytest.mark.xfail(('She turned to him, "This is great." she said.', ['She turned to him, "This is great." she said.'])), | ||||
|     pytest.param("She turned to him, 'This is great.' she said.", ["She turned to him, 'This is great.' she said."], marks=pytest.mark.xfail()), | ||||
|     pytest.param('She turned to him, "This is great." she said.', ['She turned to him, "This is great." she said.'], marks=pytest.mark.xfail()), | ||||
|     ('She turned to him, "This is great." She held the book out to show him.', ['She turned to him, "This is great."', "She held the book out to show him."]), | ||||
|     ("Hello!! Long time no see.", ["Hello!!", "Long time no see."]), | ||||
|     ("Hello?? Who is there?", ["Hello??", "Who is there?"]), | ||||
|     ("Hello!? Is that you?", ["Hello!?", "Is that you?"]), | ||||
|     ("Hello?! Is that you?", ["Hello?!", "Is that you?"]), | ||||
|     pytest.mark.xfail(("1.) The first item 2.) The second item", ["1.) The first item", "2.) The second item"])), | ||||
|     pytest.mark.xfail(("1.) The first item. 2.) The second item.", ["1.) The first item.", "2.) The second item."])), | ||||
|     pytest.mark.xfail(("1) The first item 2) The second item", ["1) The first item", "2) The second item"])), | ||||
|     pytest.param("1.) The first item 2.) The second item", ["1.) The first item", "2.) The second item"], marks=pytest.mark.xfail()), | ||||
|     pytest.param("1.) The first item. 2.) The second item.", ["1.) The first item.", "2.) The second item."], marks=pytest.mark.xfail()), | ||||
|     pytest.param("1) The first item 2) The second item", ["1) The first item", "2) The second item"], marks=pytest.mark.xfail()), | ||||
|     ("1) The first item. 2) The second item.", ["1) The first item.", "2) The second item."]), | ||||
|     pytest.mark.xfail(("1. The first item 2. The second item", ["1. The first item", "2. The second item"])), | ||||
|     pytest.mark.xfail(("1. The first item. 2. The second item.", ["1. The first item.", "2. The second item."])), | ||||
|     pytest.mark.xfail(("• 9. The first item • 10. The second item", ["• 9. The first item", "• 10. The second item"])), | ||||
|     pytest.mark.xfail(("⁃9. The first item ⁃10. The second item", ["⁃9. The first item", "⁃10. The second item"])), | ||||
|     pytest.mark.xfail(("a. The first item b. The second item c. The third list item", ["a. The first item", "b. The second item", "c. The third list item"])), | ||||
|     pytest.param("1. The first item 2. The second item", ["1. The first item", "2. The second item"], marks=pytest.mark.xfail()), | ||||
|     pytest.param("1. The first item. 2. The second item.", ["1. The first item.", "2. The second item."], marks=pytest.mark.xfail()), | ||||
|     pytest.param("• 9. The first item • 10. The second item", ["• 9. The first item", "• 10. The second item"], marks=pytest.mark.xfail()), | ||||
|     pytest.param("⁃9. The first item ⁃10. The second item", ["⁃9. The first item", "⁃10. The second item"], marks=pytest.mark.xfail()), | ||||
|     pytest.param("a. The first item b. The second item c. The third list item", ["a. The first item", "b. The second item", "c. The third list item"], marks=pytest.mark.xfail()), | ||||
|     ("This is a sentence\ncut off in the middle because pdf.", ["This is a sentence\ncut off in the middle because pdf."]), | ||||
|     ("It was a cold \nnight in the city.", ["It was a cold \nnight in the city."]), | ||||
|     pytest.mark.xfail(("features\ncontact manager\nevents, activities\n", ["features", "contact manager", "events, activities"])), | ||||
|     pytest.mark.xfail(("You can find it at N°. 1026.253.553. That is where the treasure is.", ["You can find it at N°. 1026.253.553.", "That is where the treasure is."])), | ||||
|     pytest.param("features\ncontact manager\nevents, activities\n", ["features", "contact manager", "events, activities"], marks=pytest.mark.xfail()), | ||||
|     pytest.param("You can find it at N°. 1026.253.553. That is where the treasure is.", ["You can find it at N°. 1026.253.553.", "That is where the treasure is."], marks=pytest.mark.xfail()), | ||||
|     ("She works at Yahoo! in the accounting department.", ["She works at Yahoo! in the accounting department."]), | ||||
|     ("We make a good team, you and I. Did you see Albert I. Jones yesterday?", ["We make a good team, you and I.", "Did you see Albert I. Jones yesterday?"]), | ||||
|     ("Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”", ["Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”"]), | ||||
|     pytest.mark.xfail((""""Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).""", ['"Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).'])), | ||||
|     pytest.param(""""Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).""", ['"Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).'], marks=pytest.mark.xfail()), | ||||
|     ("If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . . Next sentence.", ["If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . .", "Next sentence."]), | ||||
|     ("I never meant that.... She left the store.", ["I never meant that....", "She left the store."]), | ||||
|     pytest.mark.xfail(("I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it.", ["I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it."])), | ||||
|     pytest.mark.xfail(("One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds. . . . The practice was not abandoned. . . .", ["One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds.", ". . . The practice was not abandoned. . . ."])), | ||||
|     pytest.mark.xfail(("Hello world.Today is Tuesday.Mr. Smith went to the store and bought 1,000.That is a lot.", ["Hello world.", "Today is Tuesday.", "Mr. Smith went to the store and bought 1,000.", "That is a lot."])) | ||||
|     pytest.param("I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it.", ["I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it."], marks=pytest.mark.xfail()), | ||||
|     pytest.param("One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds. . . . The practice was not abandoned. . . .", ["One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds.", ". . . The practice was not abandoned. . . ."], marks=pytest.mark.xfail()), | ||||
|     pytest.param("Hello world.Today is Tuesday.Mr. Smith went to the store and bought 1,000.That is a lot.", ["Hello world.", "Today is Tuesday.", "Mr. Smith went to the store and bought 1,000.", "That is a lot."], marks=pytest.mark.xfail()) | ||||
| ] | ||||
| 
 | ||||
| @pytest.mark.skip | ||||
|  |  | |||
|  | @ -29,7 +29,7 @@ untimely death" of the rapier-tongued Scottish barrister and parliamentarian. | |||
|     ("""Yes! "I'd rather have a walk", Ms. Comble sighed. """, 15), | ||||
|     ("""'Me too!', Mr. P. Delaware cried. """, 11), | ||||
|     ("They ran about 10km.", 6), | ||||
|     pytest.mark.xfail(("But then the 6,000-year ice age came...", 10))]) | ||||
|     pytest.param("But then the 6,000-year ice age came...", 10, marks=pytest.mark.xfail())]) | ||||
| def test_en_tokenizer_handles_cnts(en_tokenizer, text, length): | ||||
|     tokens = en_tokenizer(text) | ||||
|     assert len(tokens) == length | ||||
|  |  | |||
|  | @ -11,7 +11,7 @@ def fr_tokenizer(): | |||
| 
 | ||||
| 
 | ||||
| @pytest.mark.parametrize('text', ["aujourd'hui", "Aujourd'hui", "prud'hommes", | ||||
|                                   "prud’hommal"]) | ||||
|                                   "prud’hommal", "entr'amis"]) | ||||
| def test_tokenizer_infix_exceptions(fr_tokenizer, text): | ||||
|     tokens = fr_tokenizer(text) | ||||
|     assert len(tokens) == 1 | ||||
|  |  | |||
|  | @ -5,11 +5,11 @@ import pytest | |||
| 
 | ||||
| DEFAULT_TESTS = [ | ||||
|     ('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']), | ||||
|     pytest.mark.xfail(('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'])), | ||||
|     pytest.param('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'], marks=pytest.mark.xfail()), | ||||
|     ('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']), | ||||
|     ('A pl. rovidites.', ['A', 'pl.', 'rovidites', '.']), | ||||
|     ('A S.M.A.R.T. szo.', ['A', 'S.M.A.R.T.', 'szo', '.']), | ||||
|     pytest.mark.xfail(('A .hu.', ['A', '.hu', '.'])), | ||||
|     pytest.param('A .hu.', ['A', '.hu', '.'], marks=pytest.mark.xfail()), | ||||
|     ('Az egy.ketto.', ['Az', 'egy.ketto', '.']), | ||||
|     ('A pl.', ['A', 'pl.']), | ||||
|     ('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']), | ||||
|  | @ -227,11 +227,11 @@ QUOTE_TESTS = [ | |||
| 
 | ||||
| DOT_TESTS = [ | ||||
|     ('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']), | ||||
|     pytest.mark.xfail(('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'])), | ||||
|     pytest.param('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'], marks=pytest.mark.xfail()), | ||||
|     ('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']), | ||||
|     ('A pl. rövidítés.', ['A', 'pl.', 'rövidítés', '.']), | ||||
|     ('A S.M.A.R.T. szó.', ['A', 'S.M.A.R.T.', 'szó', '.']), | ||||
|     pytest.mark.xfail(('A .hu.', ['A', '.hu', '.'])), | ||||
|     pytest.param('A .hu.', ['A', '.hu', '.'], marks=pytest.mark.xfail()), | ||||
|     ('Az egy.ketto.', ['Az', 'egy.ketto', '.']), | ||||
|     ('A pl.', ['A', 'pl.']), | ||||
|     ('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']), | ||||
|  |  | |||
|  | @ -7,7 +7,6 @@ import pytest | |||
| from ...cli.train import train | ||||
| 
 | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| def test_cli_trained_model_can_be_saved(tmpdir): | ||||
|     lang = 'nl' | ||||
|     output_dir = str(tmpdir) | ||||
|  | @ -7,7 +7,6 @@ from ...vocab import Vocab | |||
| from ...tokens import Doc, Span | ||||
| 
 | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| def test_issue1547(): | ||||
|     """Test that entity labels still match after merging tokens.""" | ||||
|     words = ['\n', 'worda', '.', '\n', 'wordb', '-', 'Biosphere', '2', '-', ' \n'] | ||||
|  |  | |||
|  | @ -6,7 +6,7 @@ from ...vocab import Vocab | |||
| from ...tokens import Doc | ||||
| from ...matcher import Matcher | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| 
 | ||||
| def test_issue1945(): | ||||
|     text = "a a a" | ||||
|     matcher = Matcher(Vocab()) | ||||
|  |  | |||
|  | @ -4,7 +4,6 @@ import pytest | |||
| from ...gold import iob_to_biluo | ||||
| 
 | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| @pytest.mark.parametrize('tags', [('B-ORG', 'L-ORG'), | ||||
|                                   ('B-PERSON', 'I-PERSON', 'L-PERSON'), | ||||
|                                   ('U-BRAWLER', 'U-BRAWLER')]) | ||||
|  | @ -13,21 +12,18 @@ def test_issue2385_biluo(tags): | |||
|     assert iob_to_biluo(tags) == list(tags) | ||||
| 
 | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| @pytest.mark.parametrize('tags', [('B-BRAWLER', 'I-BRAWLER', 'I-BRAWLER')]) | ||||
| def test_issue2385_iob_bcharacter(tags): | ||||
|     """fix bug in labels with a 'b' character""" | ||||
|     assert iob_to_biluo(tags) == ['B-BRAWLER', 'I-BRAWLER', 'L-BRAWLER'] | ||||
| 
 | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| @pytest.mark.parametrize('tags', [('I-ORG', 'I-ORG', 'B-ORG')]) | ||||
| def test_issue2385_iob1(tags): | ||||
|     """maintain support for iob1 format""" | ||||
|     assert iob_to_biluo(tags) == ['B-ORG', 'L-ORG', 'U-ORG'] | ||||
| 
 | ||||
| 
 | ||||
| @pytest.mark.xfail | ||||
| @pytest.mark.parametrize('tags', [('B-PERSON', 'I-PERSON', 'B-PERSON')]) | ||||
| def test_issue2385_iob2(tags): | ||||
|     """maintain support for iob2 format""" | ||||
|  |  | |||
|  | @ -47,16 +47,16 @@ URLS_SHOULD_MATCH = [ | |||
|     "http://223.255.255.254", | ||||
|     "http://a.b--c.de/", # this is a legit domain name see: https://gist.github.com/dperini/729294 comment on 9/9/2014 | ||||
| 
 | ||||
|     pytest.mark.xfail("http://foo.com/blah_blah_(wikipedia)"), | ||||
|     pytest.mark.xfail("http://foo.com/blah_blah_(wikipedia)_(again)"), | ||||
|     pytest.mark.xfail("http://⌘.ws"), | ||||
|     pytest.mark.xfail("http://⌘.ws/"), | ||||
|     pytest.mark.xfail("http://☺.damowmow.com/"), | ||||
|     pytest.mark.xfail("http://✪df.ws/123"), | ||||
|     pytest.mark.xfail("http://➡.ws/䨹"), | ||||
|     pytest.mark.xfail("http://مثال.إختبار"), | ||||
|     pytest.mark.xfail("http://例子.测试"), | ||||
|     pytest.mark.xfail("http://उदाहरण.परीक्षा"), | ||||
|     pytest.param("http://foo.com/blah_blah_(wikipedia)", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://foo.com/blah_blah_(wikipedia)_(again)", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://⌘.ws", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://⌘.ws/", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://☺.damowmow.com/", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://✪df.ws/123", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://➡.ws/䨹", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://مثال.إختبار", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://例子.测试", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://उदाहरण.परीक्षा", marks=pytest.mark.xfail()), | ||||
| ] | ||||
| 
 | ||||
| URLS_SHOULD_NOT_MATCH = [ | ||||
|  | @ -95,10 +95,10 @@ URLS_SHOULD_NOT_MATCH = [ | |||
|     "http://10.1.1.1", | ||||
|     "NASDAQ:GOOG", | ||||
| 
 | ||||
|     pytest.mark.xfail("foo.com"), | ||||
|     pytest.mark.xfail("http://1.1.1.1.1"), | ||||
|     pytest.mark.xfail("http://www.foo.bar./"), | ||||
|     pytest.mark.xfail("http://-a.b.co"), | ||||
|     pytest.param("foo.com", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://1.1.1.1.1", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://www.foo.bar./", marks=pytest.mark.xfail()), | ||||
|     pytest.param("http://-a.b.co", marks=pytest.mark.xfail()), | ||||
| ] | ||||
| 
 | ||||
| 
 | ||||
|  |  | |||
|  | @ -297,7 +297,7 @@ cdef class Vocab: | |||
| 
 | ||||
|         self.vectors = Vectors(data=keep, keys=keys) | ||||
| 
 | ||||
|         syn_keys, syn_rows, scores = self.vectors.most_similar(toss) | ||||
|         syn_keys, syn_rows, scores = self.vectors.most_similar(toss, batch_size=batch_size) | ||||
| 
 | ||||
|         remap = {} | ||||
|         for i, key in enumerate(keys[nr_row:]): | ||||
|  |  | |||
|  | @ -2,7 +2,7 @@ | |||
| 
 | ||||
| p | ||||
|     |  Models trained on the | ||||
|     |  #[+a("https://catalog.ldc.upenn.edu/ldc2013t19") OntoNotes 5] corpus | ||||
|     |  #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus | ||||
|     |  support the following entity types: | ||||
| 
 | ||||
| +table(["Type", "Description"]) | ||||
|  |  | |||
|  | @ -352,6 +352,7 @@ p Retokenize the document, such that the span is merged into a single token. | |||
| +h(2, "ents") Span.ents | ||||
|     +tag property | ||||
|     +tag-model("NER") | ||||
|     +tag-new("2.0.12") | ||||
| 
 | ||||
| p | ||||
|     |  Iterate over the entities in the span. Yields named-entity | ||||
|  |  | |||
|  | @ -714,7 +714,7 @@ p The L2 norm of the token's vector representation. | |||
|         +cell bool | ||||
|         +cell | ||||
|             |  Does the token consist of ASCII characters? Equivalent to | ||||
|             |  #[code [any(ord(c) >= 128 for c in token.text)]]. | ||||
|             |  #[code all(ord(c) < 128 for c in token.text)]. | ||||
| 
 | ||||
|     +row | ||||
|         +cell #[code is_digit] | ||||
|  |  | |||
|  | @ -31,13 +31,13 @@ p | |||
|         nlp = spacy.blank('fi')  # blank instance | ||||
| 
 | ||||
| +table(["Language", "Code", "Language data"]) | ||||
|     for lang, code in LANGUAGES | ||||
|         if !Object.keys(MODELS).includes(code) | ||||
|             +row | ||||
|                 +cell #{LANGUAGES[code]} | ||||
|                 +cell #[code=code] | ||||
|                 +cell | ||||
|                     +src(gh("spaCy", "spacy/lang/" + code)) #[code lang/#{code}] | ||||
|     - var sorted_langs = Object.assign({}, ...Object.keys(LANGUAGES).filter(key => !MODELS[key]).sort().map(key => ({ [key]: LANGUAGES[key] }))) | ||||
|     for lang, code in sorted_langs | ||||
|         +row | ||||
|             +cell #{LANGUAGES[code]} | ||||
|             +cell #[code=code] | ||||
|             +cell | ||||
|                 +src(gh("spaCy", "spacy/lang/" + code)) #[code lang/#{code}] | ||||
| 
 | ||||
| +infobox("Dependencies") | ||||
|     .o-block-small Some language tokenizers require external dependencies. | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user