mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-10 19:57:17 +03:00
Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
commit
4895b2e830
106
.github/contributors/ALSchwalm.md
vendored
Normal file
106
.github/contributors/ALSchwalm.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Adam Schwalm |
|
||||
| Company name (if applicable) | Star Lab |
|
||||
| Title or role (if applicable) | Software Engineer |
|
||||
| Date | 2018-11-28 |
|
||||
| GitHub username | ALSchwalm |
|
||||
| Website (optional) | https://alschwalm.com |
|
106
.github/contributors/svlandeg.md
vendored
Normal file
106
.github/contributors/svlandeg.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Sofie Van Landeghem |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 29 Nov 2018 |
|
||||
| GitHub username | svlandeg |
|
||||
| Website (optional) | |
|
106
.github/contributors/wxv.md
vendored
Normal file
106
.github/contributors/wxv.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Jason Xu |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2018-11-29 |
|
||||
| GitHub username | wxv |
|
||||
| Website (optional) | |
|
|
@ -10,7 +10,7 @@ the **fastest syntactic parser** in the world, convolutional **neural network mo
|
|||
for tagging, parsing and **named entity recognition** and easy **deep learning**
|
||||
integration. It's commercial open-source software, released under the MIT license.
|
||||
|
||||
💫 **Version 2.0 out now!** `Check out the new features here. <https://spacy.io/usage/v2>`_
|
||||
💫 **Version 2.0 out now!** `Check out the release notes here. <https://github.com/explosion/spaCy/releases>`_
|
||||
|
||||
.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square&logo=travis
|
||||
:target: https://travis-ci.org/explosion/spaCy
|
||||
|
@ -88,7 +88,7 @@ Features
|
|||
* **Fastest syntactic parser** in the world
|
||||
* **Named entity** recognition
|
||||
* Non-destructive **tokenization**
|
||||
* Support for **20+ languages**
|
||||
* Support for **30+ languages**
|
||||
* Pre-trained `statistical models <https://spacy.io/models>`_ and word vectors
|
||||
* Easy **deep learning** integration
|
||||
* Part-of-speech tagging
|
||||
|
@ -200,11 +200,6 @@ or manually by pointing pip to a path or URL.
|
|||
# pip install .tar.gz archive from path or URL
|
||||
pip install /Users/you/en_core_web_sm-2.0.0.tar.gz
|
||||
|
||||
If you have SSL certification problems, SSL customization options are described in the help:
|
||||
|
||||
# help for the download command
|
||||
python -m spacy download --help
|
||||
|
||||
Loading and using models
|
||||
------------------------
|
||||
|
||||
|
|
|
@ -7,8 +7,8 @@ murmurhash>=0.28.0,<1.1.0
|
|||
plac<1.0.0,>=0.9.6
|
||||
ujson>=1.35
|
||||
dill>=0.2,<0.3
|
||||
regex>=2017.4.5,<2017.12.1
|
||||
regex==2018.01.10
|
||||
requests>=2.13.0,<3.0.0
|
||||
pytest>=3.6.0,<4.0.0
|
||||
pytest>=4.0.0,<5.0.0
|
||||
mock>=2.0.0,<3.0.0
|
||||
pathlib==1.0.1; python_version < "3.4"
|
||||
|
|
2
setup.py
2
setup.py
|
@ -200,7 +200,7 @@ def setup_package():
|
|||
'plac<1.0.0,>=0.9.6',
|
||||
'ujson>=1.35',
|
||||
'dill>=0.2,<0.3',
|
||||
'regex>=2017.4.5,<2017.12.1',
|
||||
'regex==2018.01.10',
|
||||
'requests>=2.13.0,<3.0.0',
|
||||
'pathlib==1.0.1; python_version < "3.4"'],
|
||||
extras_require={
|
||||
|
|
|
@ -141,7 +141,7 @@ _regular_exp += ["^{prefix}[{hyphen}][{alpha}][{alpha}{elision}{other_hyphen}\-]
|
|||
elision=ELISION, alpha=ALPHA_LOWER)
|
||||
for p in _hyphen_prefix]
|
||||
_regular_exp += ["^{prefix}[{elision}][{alpha}][{alpha}{elision}{hyphen}\-]*$".format(
|
||||
prefix=p, elision=HYPHENS, hyphen=_other_hyphens, alpha=ALPHA_LOWER)
|
||||
prefix=p, elision=ELISION, hyphen=_other_hyphens, alpha=ALPHA_LOWER)
|
||||
for p in _elision_prefix]
|
||||
_regular_exp.append(URL_PATTERN)
|
||||
|
||||
|
|
|
@ -33,7 +33,6 @@ def test_de_tokenizer_norm_exceptions(de_tokenizer, text, norms):
|
|||
assert [token.norm_ for token in tokens] == norms
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
@pytest.mark.parametrize('text,norm', [("daß", "dass")])
|
||||
def test_de_lex_attrs_norm_exceptions(de_tokenizer, text, norm):
|
||||
tokens = de_tokenizer(text)
|
||||
|
|
|
@ -61,7 +61,7 @@ def test_en_sbd_serialization_projective(EN):
|
|||
|
||||
|
||||
TEST_CASES = [
|
||||
pytest.mark.xfail(("Hello World. My name is Jonas.", ["Hello World.", "My name is Jonas."])),
|
||||
pytest.param("Hello World. My name is Jonas.", ["Hello World.", "My name is Jonas."], marks=pytest.mark.xfail()),
|
||||
("What is your name? My name is Jonas.", ["What is your name?", "My name is Jonas."]),
|
||||
("There it is! I found it.", ["There it is!", "I found it."]),
|
||||
("My name is Jonas E. Smith.", ["My name is Jonas E. Smith."]),
|
||||
|
@ -71,48 +71,48 @@ TEST_CASES = [
|
|||
("Let's ask Jane and co. They should know.", ["Let's ask Jane and co.", "They should know."]),
|
||||
("They closed the deal with Pitt, Briggs & Co. It closed yesterday.", ["They closed the deal with Pitt, Briggs & Co.", "It closed yesterday."]),
|
||||
("I can see Mt. Fuji from here.", ["I can see Mt. Fuji from here."]),
|
||||
pytest.mark.xfail(("St. Michael's Church is on 5th st. near the light.", ["St. Michael's Church is on 5th st. near the light."])),
|
||||
pytest.param("St. Michael's Church is on 5th st. near the light.", ["St. Michael's Church is on 5th st. near the light."], marks=pytest.mark.xfail()),
|
||||
("That is JFK Jr.'s book.", ["That is JFK Jr.'s book."]),
|
||||
("I visited the U.S.A. last year.", ["I visited the U.S.A. last year."]),
|
||||
("I live in the E.U. How about you?", ["I live in the E.U.", "How about you?"]),
|
||||
("I live in the U.S. How about you?", ["I live in the U.S.", "How about you?"]),
|
||||
("I work for the U.S. Government in Virginia.", ["I work for the U.S. Government in Virginia."]),
|
||||
("I have lived in the U.S. for 20 years.", ["I have lived in the U.S. for 20 years."]),
|
||||
pytest.mark.xfail(("At 5 a.m. Mr. Smith went to the bank. He left the bank at 6 P.M. Mr. Smith then went to the store.", ["At 5 a.m. Mr. Smith went to the bank.", "He left the bank at 6 P.M.", "Mr. Smith then went to the store."])),
|
||||
pytest.param("At 5 a.m. Mr. Smith went to the bank. He left the bank at 6 P.M. Mr. Smith then went to the store.", ["At 5 a.m. Mr. Smith went to the bank.", "He left the bank at 6 P.M.", "Mr. Smith then went to the store."], marks=pytest.mark.xfail()),
|
||||
("She has $100.00 in her bag.", ["She has $100.00 in her bag."]),
|
||||
("She has $100.00. It is in her bag.", ["She has $100.00.", "It is in her bag."]),
|
||||
("He teaches science (He previously worked for 5 years as an engineer.) at the local University.", ["He teaches science (He previously worked for 5 years as an engineer.) at the local University."]),
|
||||
("Her email is Jane.Doe@example.com. I sent her an email.", ["Her email is Jane.Doe@example.com.", "I sent her an email."]),
|
||||
("The site is: https://www.example.50.com/new-site/awesome_content.html. Please check it out.", ["The site is: https://www.example.50.com/new-site/awesome_content.html.", "Please check it out."]),
|
||||
pytest.mark.xfail(("She turned to him, 'This is great.' she said.", ["She turned to him, 'This is great.' she said."])),
|
||||
pytest.mark.xfail(('She turned to him, "This is great." she said.', ['She turned to him, "This is great." she said.'])),
|
||||
pytest.param("She turned to him, 'This is great.' she said.", ["She turned to him, 'This is great.' she said."], marks=pytest.mark.xfail()),
|
||||
pytest.param('She turned to him, "This is great." she said.', ['She turned to him, "This is great." she said.'], marks=pytest.mark.xfail()),
|
||||
('She turned to him, "This is great." She held the book out to show him.', ['She turned to him, "This is great."', "She held the book out to show him."]),
|
||||
("Hello!! Long time no see.", ["Hello!!", "Long time no see."]),
|
||||
("Hello?? Who is there?", ["Hello??", "Who is there?"]),
|
||||
("Hello!? Is that you?", ["Hello!?", "Is that you?"]),
|
||||
("Hello?! Is that you?", ["Hello?!", "Is that you?"]),
|
||||
pytest.mark.xfail(("1.) The first item 2.) The second item", ["1.) The first item", "2.) The second item"])),
|
||||
pytest.mark.xfail(("1.) The first item. 2.) The second item.", ["1.) The first item.", "2.) The second item."])),
|
||||
pytest.mark.xfail(("1) The first item 2) The second item", ["1) The first item", "2) The second item"])),
|
||||
pytest.param("1.) The first item 2.) The second item", ["1.) The first item", "2.) The second item"], marks=pytest.mark.xfail()),
|
||||
pytest.param("1.) The first item. 2.) The second item.", ["1.) The first item.", "2.) The second item."], marks=pytest.mark.xfail()),
|
||||
pytest.param("1) The first item 2) The second item", ["1) The first item", "2) The second item"], marks=pytest.mark.xfail()),
|
||||
("1) The first item. 2) The second item.", ["1) The first item.", "2) The second item."]),
|
||||
pytest.mark.xfail(("1. The first item 2. The second item", ["1. The first item", "2. The second item"])),
|
||||
pytest.mark.xfail(("1. The first item. 2. The second item.", ["1. The first item.", "2. The second item."])),
|
||||
pytest.mark.xfail(("• 9. The first item • 10. The second item", ["• 9. The first item", "• 10. The second item"])),
|
||||
pytest.mark.xfail(("⁃9. The first item ⁃10. The second item", ["⁃9. The first item", "⁃10. The second item"])),
|
||||
pytest.mark.xfail(("a. The first item b. The second item c. The third list item", ["a. The first item", "b. The second item", "c. The third list item"])),
|
||||
pytest.param("1. The first item 2. The second item", ["1. The first item", "2. The second item"], marks=pytest.mark.xfail()),
|
||||
pytest.param("1. The first item. 2. The second item.", ["1. The first item.", "2. The second item."], marks=pytest.mark.xfail()),
|
||||
pytest.param("• 9. The first item • 10. The second item", ["• 9. The first item", "• 10. The second item"], marks=pytest.mark.xfail()),
|
||||
pytest.param("⁃9. The first item ⁃10. The second item", ["⁃9. The first item", "⁃10. The second item"], marks=pytest.mark.xfail()),
|
||||
pytest.param("a. The first item b. The second item c. The third list item", ["a. The first item", "b. The second item", "c. The third list item"], marks=pytest.mark.xfail()),
|
||||
("This is a sentence\ncut off in the middle because pdf.", ["This is a sentence\ncut off in the middle because pdf."]),
|
||||
("It was a cold \nnight in the city.", ["It was a cold \nnight in the city."]),
|
||||
pytest.mark.xfail(("features\ncontact manager\nevents, activities\n", ["features", "contact manager", "events, activities"])),
|
||||
pytest.mark.xfail(("You can find it at N°. 1026.253.553. That is where the treasure is.", ["You can find it at N°. 1026.253.553.", "That is where the treasure is."])),
|
||||
pytest.param("features\ncontact manager\nevents, activities\n", ["features", "contact manager", "events, activities"], marks=pytest.mark.xfail()),
|
||||
pytest.param("You can find it at N°. 1026.253.553. That is where the treasure is.", ["You can find it at N°. 1026.253.553.", "That is where the treasure is."], marks=pytest.mark.xfail()),
|
||||
("She works at Yahoo! in the accounting department.", ["She works at Yahoo! in the accounting department."]),
|
||||
("We make a good team, you and I. Did you see Albert I. Jones yesterday?", ["We make a good team, you and I.", "Did you see Albert I. Jones yesterday?"]),
|
||||
("Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”", ["Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”"]),
|
||||
pytest.mark.xfail((""""Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).""", ['"Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).'])),
|
||||
pytest.param(""""Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).""", ['"Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).'], marks=pytest.mark.xfail()),
|
||||
("If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . . Next sentence.", ["If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . .", "Next sentence."]),
|
||||
("I never meant that.... She left the store.", ["I never meant that....", "She left the store."]),
|
||||
pytest.mark.xfail(("I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it.", ["I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it."])),
|
||||
pytest.mark.xfail(("One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds. . . . The practice was not abandoned. . . .", ["One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds.", ". . . The practice was not abandoned. . . ."])),
|
||||
pytest.mark.xfail(("Hello world.Today is Tuesday.Mr. Smith went to the store and bought 1,000.That is a lot.", ["Hello world.", "Today is Tuesday.", "Mr. Smith went to the store and bought 1,000.", "That is a lot."]))
|
||||
pytest.param("I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it.", ["I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it."], marks=pytest.mark.xfail()),
|
||||
pytest.param("One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds. . . . The practice was not abandoned. . . .", ["One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds.", ". . . The practice was not abandoned. . . ."], marks=pytest.mark.xfail()),
|
||||
pytest.param("Hello world.Today is Tuesday.Mr. Smith went to the store and bought 1,000.That is a lot.", ["Hello world.", "Today is Tuesday.", "Mr. Smith went to the store and bought 1,000.", "That is a lot."], marks=pytest.mark.xfail())
|
||||
]
|
||||
|
||||
@pytest.mark.skip
|
||||
|
|
|
@ -29,7 +29,7 @@ untimely death" of the rapier-tongued Scottish barrister and parliamentarian.
|
|||
("""Yes! "I'd rather have a walk", Ms. Comble sighed. """, 15),
|
||||
("""'Me too!', Mr. P. Delaware cried. """, 11),
|
||||
("They ran about 10km.", 6),
|
||||
pytest.mark.xfail(("But then the 6,000-year ice age came...", 10))])
|
||||
pytest.param("But then the 6,000-year ice age came...", 10, marks=pytest.mark.xfail())])
|
||||
def test_en_tokenizer_handles_cnts(en_tokenizer, text, length):
|
||||
tokens = en_tokenizer(text)
|
||||
assert len(tokens) == length
|
||||
|
|
|
@ -11,7 +11,7 @@ def fr_tokenizer():
|
|||
|
||||
|
||||
@pytest.mark.parametrize('text', ["aujourd'hui", "Aujourd'hui", "prud'hommes",
|
||||
"prud’hommal"])
|
||||
"prud’hommal", "entr'amis"])
|
||||
def test_tokenizer_infix_exceptions(fr_tokenizer, text):
|
||||
tokens = fr_tokenizer(text)
|
||||
assert len(tokens) == 1
|
||||
|
|
|
@ -5,11 +5,11 @@ import pytest
|
|||
|
||||
DEFAULT_TESTS = [
|
||||
('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']),
|
||||
pytest.mark.xfail(('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'])),
|
||||
pytest.param('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'], marks=pytest.mark.xfail()),
|
||||
('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']),
|
||||
('A pl. rovidites.', ['A', 'pl.', 'rovidites', '.']),
|
||||
('A S.M.A.R.T. szo.', ['A', 'S.M.A.R.T.', 'szo', '.']),
|
||||
pytest.mark.xfail(('A .hu.', ['A', '.hu', '.'])),
|
||||
pytest.param('A .hu.', ['A', '.hu', '.'], marks=pytest.mark.xfail()),
|
||||
('Az egy.ketto.', ['Az', 'egy.ketto', '.']),
|
||||
('A pl.', ['A', 'pl.']),
|
||||
('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']),
|
||||
|
@ -227,11 +227,11 @@ QUOTE_TESTS = [
|
|||
|
||||
DOT_TESTS = [
|
||||
('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']),
|
||||
pytest.mark.xfail(('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'])),
|
||||
pytest.param('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'], marks=pytest.mark.xfail()),
|
||||
('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']),
|
||||
('A pl. rövidítés.', ['A', 'pl.', 'rövidítés', '.']),
|
||||
('A S.M.A.R.T. szó.', ['A', 'S.M.A.R.T.', 'szó', '.']),
|
||||
pytest.mark.xfail(('A .hu.', ['A', '.hu', '.'])),
|
||||
pytest.param('A .hu.', ['A', '.hu', '.'], marks=pytest.mark.xfail()),
|
||||
('Az egy.ketto.', ['Az', 'egy.ketto', '.']),
|
||||
('A pl.', ['A', 'pl.']),
|
||||
('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']),
|
||||
|
|
|
@ -7,7 +7,6 @@ import pytest
|
|||
from ...cli.train import train
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
def test_cli_trained_model_can_be_saved(tmpdir):
|
||||
lang = 'nl'
|
||||
output_dir = str(tmpdir)
|
|
@ -7,7 +7,6 @@ from ...vocab import Vocab
|
|||
from ...tokens import Doc, Span
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
def test_issue1547():
|
||||
"""Test that entity labels still match after merging tokens."""
|
||||
words = ['\n', 'worda', '.', '\n', 'wordb', '-', 'Biosphere', '2', '-', ' \n']
|
||||
|
|
|
@ -6,7 +6,7 @@ from ...vocab import Vocab
|
|||
from ...tokens import Doc
|
||||
from ...matcher import Matcher
|
||||
|
||||
@pytest.mark.xfail
|
||||
|
||||
def test_issue1945():
|
||||
text = "a a a"
|
||||
matcher = Matcher(Vocab())
|
||||
|
|
|
@ -4,7 +4,6 @@ import pytest
|
|||
from ...gold import iob_to_biluo
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
@pytest.mark.parametrize('tags', [('B-ORG', 'L-ORG'),
|
||||
('B-PERSON', 'I-PERSON', 'L-PERSON'),
|
||||
('U-BRAWLER', 'U-BRAWLER')])
|
||||
|
@ -13,21 +12,18 @@ def test_issue2385_biluo(tags):
|
|||
assert iob_to_biluo(tags) == list(tags)
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
@pytest.mark.parametrize('tags', [('B-BRAWLER', 'I-BRAWLER', 'I-BRAWLER')])
|
||||
def test_issue2385_iob_bcharacter(tags):
|
||||
"""fix bug in labels with a 'b' character"""
|
||||
assert iob_to_biluo(tags) == ['B-BRAWLER', 'I-BRAWLER', 'L-BRAWLER']
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
@pytest.mark.parametrize('tags', [('I-ORG', 'I-ORG', 'B-ORG')])
|
||||
def test_issue2385_iob1(tags):
|
||||
"""maintain support for iob1 format"""
|
||||
assert iob_to_biluo(tags) == ['B-ORG', 'L-ORG', 'U-ORG']
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
@pytest.mark.parametrize('tags', [('B-PERSON', 'I-PERSON', 'B-PERSON')])
|
||||
def test_issue2385_iob2(tags):
|
||||
"""maintain support for iob2 format"""
|
||||
|
|
|
@ -47,16 +47,16 @@ URLS_SHOULD_MATCH = [
|
|||
"http://223.255.255.254",
|
||||
"http://a.b--c.de/", # this is a legit domain name see: https://gist.github.com/dperini/729294 comment on 9/9/2014
|
||||
|
||||
pytest.mark.xfail("http://foo.com/blah_blah_(wikipedia)"),
|
||||
pytest.mark.xfail("http://foo.com/blah_blah_(wikipedia)_(again)"),
|
||||
pytest.mark.xfail("http://⌘.ws"),
|
||||
pytest.mark.xfail("http://⌘.ws/"),
|
||||
pytest.mark.xfail("http://☺.damowmow.com/"),
|
||||
pytest.mark.xfail("http://✪df.ws/123"),
|
||||
pytest.mark.xfail("http://➡.ws/䨹"),
|
||||
pytest.mark.xfail("http://مثال.إختبار"),
|
||||
pytest.mark.xfail("http://例子.测试"),
|
||||
pytest.mark.xfail("http://उदाहरण.परीक्षा"),
|
||||
pytest.param("http://foo.com/blah_blah_(wikipedia)", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://foo.com/blah_blah_(wikipedia)_(again)", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://⌘.ws", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://⌘.ws/", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://☺.damowmow.com/", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://✪df.ws/123", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://➡.ws/䨹", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://مثال.إختبار", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://例子.测试", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://उदाहरण.परीक्षा", marks=pytest.mark.xfail()),
|
||||
]
|
||||
|
||||
URLS_SHOULD_NOT_MATCH = [
|
||||
|
@ -95,10 +95,10 @@ URLS_SHOULD_NOT_MATCH = [
|
|||
"http://10.1.1.1",
|
||||
"NASDAQ:GOOG",
|
||||
|
||||
pytest.mark.xfail("foo.com"),
|
||||
pytest.mark.xfail("http://1.1.1.1.1"),
|
||||
pytest.mark.xfail("http://www.foo.bar./"),
|
||||
pytest.mark.xfail("http://-a.b.co"),
|
||||
pytest.param("foo.com", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://1.1.1.1.1", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://www.foo.bar./", marks=pytest.mark.xfail()),
|
||||
pytest.param("http://-a.b.co", marks=pytest.mark.xfail()),
|
||||
]
|
||||
|
||||
|
||||
|
|
|
@ -297,7 +297,7 @@ cdef class Vocab:
|
|||
|
||||
self.vectors = Vectors(data=keep, keys=keys)
|
||||
|
||||
syn_keys, syn_rows, scores = self.vectors.most_similar(toss)
|
||||
syn_keys, syn_rows, scores = self.vectors.most_similar(toss, batch_size=batch_size)
|
||||
|
||||
remap = {}
|
||||
for i, key in enumerate(keys[nr_row:]):
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
p
|
||||
| Models trained on the
|
||||
| #[+a("https://catalog.ldc.upenn.edu/ldc2013t19") OntoNotes 5] corpus
|
||||
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus
|
||||
| support the following entity types:
|
||||
|
||||
+table(["Type", "Description"])
|
||||
|
|
|
@ -352,6 +352,7 @@ p Retokenize the document, such that the span is merged into a single token.
|
|||
+h(2, "ents") Span.ents
|
||||
+tag property
|
||||
+tag-model("NER")
|
||||
+tag-new("2.0.12")
|
||||
|
||||
p
|
||||
| Iterate over the entities in the span. Yields named-entity
|
||||
|
|
|
@ -714,7 +714,7 @@ p The L2 norm of the token's vector representation.
|
|||
+cell bool
|
||||
+cell
|
||||
| Does the token consist of ASCII characters? Equivalent to
|
||||
| #[code [any(ord(c) >= 128 for c in token.text)]].
|
||||
| #[code all(ord(c) < 128 for c in token.text)].
|
||||
|
||||
+row
|
||||
+cell #[code is_digit]
|
||||
|
|
|
@ -31,13 +31,13 @@ p
|
|||
nlp = spacy.blank('fi') # blank instance
|
||||
|
||||
+table(["Language", "Code", "Language data"])
|
||||
for lang, code in LANGUAGES
|
||||
if !Object.keys(MODELS).includes(code)
|
||||
+row
|
||||
+cell #{LANGUAGES[code]}
|
||||
+cell #[code=code]
|
||||
+cell
|
||||
+src(gh("spaCy", "spacy/lang/" + code)) #[code lang/#{code}]
|
||||
- var sorted_langs = Object.assign({}, ...Object.keys(LANGUAGES).filter(key => !MODELS[key]).sort().map(key => ({ [key]: LANGUAGES[key] })))
|
||||
for lang, code in sorted_langs
|
||||
+row
|
||||
+cell #{LANGUAGES[code]}
|
||||
+cell #[code=code]
|
||||
+cell
|
||||
+src(gh("spaCy", "spacy/lang/" + code)) #[code lang/#{code}]
|
||||
|
||||
+infobox("Dependencies")
|
||||
.o-block-small Some language tokenizers require external dependencies.
|
||||
|
|
Loading…
Reference in New Issue
Block a user