mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 10:46:29 +03:00
Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
commit
4895b2e830
106
.github/contributors/ALSchwalm.md
vendored
Normal file
106
.github/contributors/ALSchwalm.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | ------------------------ |
|
||||||
|
| Name | Adam Schwalm |
|
||||||
|
| Company name (if applicable) | Star Lab |
|
||||||
|
| Title or role (if applicable) | Software Engineer |
|
||||||
|
| Date | 2018-11-28 |
|
||||||
|
| GitHub username | ALSchwalm |
|
||||||
|
| Website (optional) | https://alschwalm.com |
|
106
.github/contributors/svlandeg.md
vendored
Normal file
106
.github/contributors/svlandeg.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Sofie Van Landeghem |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 29 Nov 2018 |
|
||||||
|
| GitHub username | svlandeg |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/wxv.md
vendored
Normal file
106
.github/contributors/wxv.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Jason Xu |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2018-11-29 |
|
||||||
|
| GitHub username | wxv |
|
||||||
|
| Website (optional) | |
|
|
@ -10,7 +10,7 @@ the **fastest syntactic parser** in the world, convolutional **neural network mo
|
||||||
for tagging, parsing and **named entity recognition** and easy **deep learning**
|
for tagging, parsing and **named entity recognition** and easy **deep learning**
|
||||||
integration. It's commercial open-source software, released under the MIT license.
|
integration. It's commercial open-source software, released under the MIT license.
|
||||||
|
|
||||||
💫 **Version 2.0 out now!** `Check out the new features here. <https://spacy.io/usage/v2>`_
|
💫 **Version 2.0 out now!** `Check out the release notes here. <https://github.com/explosion/spaCy/releases>`_
|
||||||
|
|
||||||
.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square&logo=travis
|
.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square&logo=travis
|
||||||
:target: https://travis-ci.org/explosion/spaCy
|
:target: https://travis-ci.org/explosion/spaCy
|
||||||
|
@ -88,7 +88,7 @@ Features
|
||||||
* **Fastest syntactic parser** in the world
|
* **Fastest syntactic parser** in the world
|
||||||
* **Named entity** recognition
|
* **Named entity** recognition
|
||||||
* Non-destructive **tokenization**
|
* Non-destructive **tokenization**
|
||||||
* Support for **20+ languages**
|
* Support for **30+ languages**
|
||||||
* Pre-trained `statistical models <https://spacy.io/models>`_ and word vectors
|
* Pre-trained `statistical models <https://spacy.io/models>`_ and word vectors
|
||||||
* Easy **deep learning** integration
|
* Easy **deep learning** integration
|
||||||
* Part-of-speech tagging
|
* Part-of-speech tagging
|
||||||
|
@ -200,11 +200,6 @@ or manually by pointing pip to a path or URL.
|
||||||
# pip install .tar.gz archive from path or URL
|
# pip install .tar.gz archive from path or URL
|
||||||
pip install /Users/you/en_core_web_sm-2.0.0.tar.gz
|
pip install /Users/you/en_core_web_sm-2.0.0.tar.gz
|
||||||
|
|
||||||
If you have SSL certification problems, SSL customization options are described in the help:
|
|
||||||
|
|
||||||
# help for the download command
|
|
||||||
python -m spacy download --help
|
|
||||||
|
|
||||||
Loading and using models
|
Loading and using models
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
|
|
|
@ -7,8 +7,8 @@ murmurhash>=0.28.0,<1.1.0
|
||||||
plac<1.0.0,>=0.9.6
|
plac<1.0.0,>=0.9.6
|
||||||
ujson>=1.35
|
ujson>=1.35
|
||||||
dill>=0.2,<0.3
|
dill>=0.2,<0.3
|
||||||
regex>=2017.4.5,<2017.12.1
|
regex==2018.01.10
|
||||||
requests>=2.13.0,<3.0.0
|
requests>=2.13.0,<3.0.0
|
||||||
pytest>=3.6.0,<4.0.0
|
pytest>=4.0.0,<5.0.0
|
||||||
mock>=2.0.0,<3.0.0
|
mock>=2.0.0,<3.0.0
|
||||||
pathlib==1.0.1; python_version < "3.4"
|
pathlib==1.0.1; python_version < "3.4"
|
||||||
|
|
2
setup.py
2
setup.py
|
@ -200,7 +200,7 @@ def setup_package():
|
||||||
'plac<1.0.0,>=0.9.6',
|
'plac<1.0.0,>=0.9.6',
|
||||||
'ujson>=1.35',
|
'ujson>=1.35',
|
||||||
'dill>=0.2,<0.3',
|
'dill>=0.2,<0.3',
|
||||||
'regex>=2017.4.5,<2017.12.1',
|
'regex==2018.01.10',
|
||||||
'requests>=2.13.0,<3.0.0',
|
'requests>=2.13.0,<3.0.0',
|
||||||
'pathlib==1.0.1; python_version < "3.4"'],
|
'pathlib==1.0.1; python_version < "3.4"'],
|
||||||
extras_require={
|
extras_require={
|
||||||
|
|
|
@ -141,7 +141,7 @@ _regular_exp += ["^{prefix}[{hyphen}][{alpha}][{alpha}{elision}{other_hyphen}\-]
|
||||||
elision=ELISION, alpha=ALPHA_LOWER)
|
elision=ELISION, alpha=ALPHA_LOWER)
|
||||||
for p in _hyphen_prefix]
|
for p in _hyphen_prefix]
|
||||||
_regular_exp += ["^{prefix}[{elision}][{alpha}][{alpha}{elision}{hyphen}\-]*$".format(
|
_regular_exp += ["^{prefix}[{elision}][{alpha}][{alpha}{elision}{hyphen}\-]*$".format(
|
||||||
prefix=p, elision=HYPHENS, hyphen=_other_hyphens, alpha=ALPHA_LOWER)
|
prefix=p, elision=ELISION, hyphen=_other_hyphens, alpha=ALPHA_LOWER)
|
||||||
for p in _elision_prefix]
|
for p in _elision_prefix]
|
||||||
_regular_exp.append(URL_PATTERN)
|
_regular_exp.append(URL_PATTERN)
|
||||||
|
|
||||||
|
|
|
@ -33,7 +33,6 @@ def test_de_tokenizer_norm_exceptions(de_tokenizer, text, norms):
|
||||||
assert [token.norm_ for token in tokens] == norms
|
assert [token.norm_ for token in tokens] == norms
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
@pytest.mark.parametrize('text,norm', [("daß", "dass")])
|
@pytest.mark.parametrize('text,norm', [("daß", "dass")])
|
||||||
def test_de_lex_attrs_norm_exceptions(de_tokenizer, text, norm):
|
def test_de_lex_attrs_norm_exceptions(de_tokenizer, text, norm):
|
||||||
tokens = de_tokenizer(text)
|
tokens = de_tokenizer(text)
|
||||||
|
|
|
@ -61,7 +61,7 @@ def test_en_sbd_serialization_projective(EN):
|
||||||
|
|
||||||
|
|
||||||
TEST_CASES = [
|
TEST_CASES = [
|
||||||
pytest.mark.xfail(("Hello World. My name is Jonas.", ["Hello World.", "My name is Jonas."])),
|
pytest.param("Hello World. My name is Jonas.", ["Hello World.", "My name is Jonas."], marks=pytest.mark.xfail()),
|
||||||
("What is your name? My name is Jonas.", ["What is your name?", "My name is Jonas."]),
|
("What is your name? My name is Jonas.", ["What is your name?", "My name is Jonas."]),
|
||||||
("There it is! I found it.", ["There it is!", "I found it."]),
|
("There it is! I found it.", ["There it is!", "I found it."]),
|
||||||
("My name is Jonas E. Smith.", ["My name is Jonas E. Smith."]),
|
("My name is Jonas E. Smith.", ["My name is Jonas E. Smith."]),
|
||||||
|
@ -71,48 +71,48 @@ TEST_CASES = [
|
||||||
("Let's ask Jane and co. They should know.", ["Let's ask Jane and co.", "They should know."]),
|
("Let's ask Jane and co. They should know.", ["Let's ask Jane and co.", "They should know."]),
|
||||||
("They closed the deal with Pitt, Briggs & Co. It closed yesterday.", ["They closed the deal with Pitt, Briggs & Co.", "It closed yesterday."]),
|
("They closed the deal with Pitt, Briggs & Co. It closed yesterday.", ["They closed the deal with Pitt, Briggs & Co.", "It closed yesterday."]),
|
||||||
("I can see Mt. Fuji from here.", ["I can see Mt. Fuji from here."]),
|
("I can see Mt. Fuji from here.", ["I can see Mt. Fuji from here."]),
|
||||||
pytest.mark.xfail(("St. Michael's Church is on 5th st. near the light.", ["St. Michael's Church is on 5th st. near the light."])),
|
pytest.param("St. Michael's Church is on 5th st. near the light.", ["St. Michael's Church is on 5th st. near the light."], marks=pytest.mark.xfail()),
|
||||||
("That is JFK Jr.'s book.", ["That is JFK Jr.'s book."]),
|
("That is JFK Jr.'s book.", ["That is JFK Jr.'s book."]),
|
||||||
("I visited the U.S.A. last year.", ["I visited the U.S.A. last year."]),
|
("I visited the U.S.A. last year.", ["I visited the U.S.A. last year."]),
|
||||||
("I live in the E.U. How about you?", ["I live in the E.U.", "How about you?"]),
|
("I live in the E.U. How about you?", ["I live in the E.U.", "How about you?"]),
|
||||||
("I live in the U.S. How about you?", ["I live in the U.S.", "How about you?"]),
|
("I live in the U.S. How about you?", ["I live in the U.S.", "How about you?"]),
|
||||||
("I work for the U.S. Government in Virginia.", ["I work for the U.S. Government in Virginia."]),
|
("I work for the U.S. Government in Virginia.", ["I work for the U.S. Government in Virginia."]),
|
||||||
("I have lived in the U.S. for 20 years.", ["I have lived in the U.S. for 20 years."]),
|
("I have lived in the U.S. for 20 years.", ["I have lived in the U.S. for 20 years."]),
|
||||||
pytest.mark.xfail(("At 5 a.m. Mr. Smith went to the bank. He left the bank at 6 P.M. Mr. Smith then went to the store.", ["At 5 a.m. Mr. Smith went to the bank.", "He left the bank at 6 P.M.", "Mr. Smith then went to the store."])),
|
pytest.param("At 5 a.m. Mr. Smith went to the bank. He left the bank at 6 P.M. Mr. Smith then went to the store.", ["At 5 a.m. Mr. Smith went to the bank.", "He left the bank at 6 P.M.", "Mr. Smith then went to the store."], marks=pytest.mark.xfail()),
|
||||||
("She has $100.00 in her bag.", ["She has $100.00 in her bag."]),
|
("She has $100.00 in her bag.", ["She has $100.00 in her bag."]),
|
||||||
("She has $100.00. It is in her bag.", ["She has $100.00.", "It is in her bag."]),
|
("She has $100.00. It is in her bag.", ["She has $100.00.", "It is in her bag."]),
|
||||||
("He teaches science (He previously worked for 5 years as an engineer.) at the local University.", ["He teaches science (He previously worked for 5 years as an engineer.) at the local University."]),
|
("He teaches science (He previously worked for 5 years as an engineer.) at the local University.", ["He teaches science (He previously worked for 5 years as an engineer.) at the local University."]),
|
||||||
("Her email is Jane.Doe@example.com. I sent her an email.", ["Her email is Jane.Doe@example.com.", "I sent her an email."]),
|
("Her email is Jane.Doe@example.com. I sent her an email.", ["Her email is Jane.Doe@example.com.", "I sent her an email."]),
|
||||||
("The site is: https://www.example.50.com/new-site/awesome_content.html. Please check it out.", ["The site is: https://www.example.50.com/new-site/awesome_content.html.", "Please check it out."]),
|
("The site is: https://www.example.50.com/new-site/awesome_content.html. Please check it out.", ["The site is: https://www.example.50.com/new-site/awesome_content.html.", "Please check it out."]),
|
||||||
pytest.mark.xfail(("She turned to him, 'This is great.' she said.", ["She turned to him, 'This is great.' she said."])),
|
pytest.param("She turned to him, 'This is great.' she said.", ["She turned to him, 'This is great.' she said."], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(('She turned to him, "This is great." she said.', ['She turned to him, "This is great." she said.'])),
|
pytest.param('She turned to him, "This is great." she said.', ['She turned to him, "This is great." she said.'], marks=pytest.mark.xfail()),
|
||||||
('She turned to him, "This is great." She held the book out to show him.', ['She turned to him, "This is great."', "She held the book out to show him."]),
|
('She turned to him, "This is great." She held the book out to show him.', ['She turned to him, "This is great."', "She held the book out to show him."]),
|
||||||
("Hello!! Long time no see.", ["Hello!!", "Long time no see."]),
|
("Hello!! Long time no see.", ["Hello!!", "Long time no see."]),
|
||||||
("Hello?? Who is there?", ["Hello??", "Who is there?"]),
|
("Hello?? Who is there?", ["Hello??", "Who is there?"]),
|
||||||
("Hello!? Is that you?", ["Hello!?", "Is that you?"]),
|
("Hello!? Is that you?", ["Hello!?", "Is that you?"]),
|
||||||
("Hello?! Is that you?", ["Hello?!", "Is that you?"]),
|
("Hello?! Is that you?", ["Hello?!", "Is that you?"]),
|
||||||
pytest.mark.xfail(("1.) The first item 2.) The second item", ["1.) The first item", "2.) The second item"])),
|
pytest.param("1.) The first item 2.) The second item", ["1.) The first item", "2.) The second item"], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("1.) The first item. 2.) The second item.", ["1.) The first item.", "2.) The second item."])),
|
pytest.param("1.) The first item. 2.) The second item.", ["1.) The first item.", "2.) The second item."], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("1) The first item 2) The second item", ["1) The first item", "2) The second item"])),
|
pytest.param("1) The first item 2) The second item", ["1) The first item", "2) The second item"], marks=pytest.mark.xfail()),
|
||||||
("1) The first item. 2) The second item.", ["1) The first item.", "2) The second item."]),
|
("1) The first item. 2) The second item.", ["1) The first item.", "2) The second item."]),
|
||||||
pytest.mark.xfail(("1. The first item 2. The second item", ["1. The first item", "2. The second item"])),
|
pytest.param("1. The first item 2. The second item", ["1. The first item", "2. The second item"], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("1. The first item. 2. The second item.", ["1. The first item.", "2. The second item."])),
|
pytest.param("1. The first item. 2. The second item.", ["1. The first item.", "2. The second item."], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("• 9. The first item • 10. The second item", ["• 9. The first item", "• 10. The second item"])),
|
pytest.param("• 9. The first item • 10. The second item", ["• 9. The first item", "• 10. The second item"], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("⁃9. The first item ⁃10. The second item", ["⁃9. The first item", "⁃10. The second item"])),
|
pytest.param("⁃9. The first item ⁃10. The second item", ["⁃9. The first item", "⁃10. The second item"], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("a. The first item b. The second item c. The third list item", ["a. The first item", "b. The second item", "c. The third list item"])),
|
pytest.param("a. The first item b. The second item c. The third list item", ["a. The first item", "b. The second item", "c. The third list item"], marks=pytest.mark.xfail()),
|
||||||
("This is a sentence\ncut off in the middle because pdf.", ["This is a sentence\ncut off in the middle because pdf."]),
|
("This is a sentence\ncut off in the middle because pdf.", ["This is a sentence\ncut off in the middle because pdf."]),
|
||||||
("It was a cold \nnight in the city.", ["It was a cold \nnight in the city."]),
|
("It was a cold \nnight in the city.", ["It was a cold \nnight in the city."]),
|
||||||
pytest.mark.xfail(("features\ncontact manager\nevents, activities\n", ["features", "contact manager", "events, activities"])),
|
pytest.param("features\ncontact manager\nevents, activities\n", ["features", "contact manager", "events, activities"], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("You can find it at N°. 1026.253.553. That is where the treasure is.", ["You can find it at N°. 1026.253.553.", "That is where the treasure is."])),
|
pytest.param("You can find it at N°. 1026.253.553. That is where the treasure is.", ["You can find it at N°. 1026.253.553.", "That is where the treasure is."], marks=pytest.mark.xfail()),
|
||||||
("She works at Yahoo! in the accounting department.", ["She works at Yahoo! in the accounting department."]),
|
("She works at Yahoo! in the accounting department.", ["She works at Yahoo! in the accounting department."]),
|
||||||
("We make a good team, you and I. Did you see Albert I. Jones yesterday?", ["We make a good team, you and I.", "Did you see Albert I. Jones yesterday?"]),
|
("We make a good team, you and I. Did you see Albert I. Jones yesterday?", ["We make a good team, you and I.", "Did you see Albert I. Jones yesterday?"]),
|
||||||
("Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”", ["Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”"]),
|
("Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”", ["Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”"]),
|
||||||
pytest.mark.xfail((""""Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).""", ['"Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).'])),
|
pytest.param(""""Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).""", ['"Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).'], marks=pytest.mark.xfail()),
|
||||||
("If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . . Next sentence.", ["If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . .", "Next sentence."]),
|
("If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . . Next sentence.", ["If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . .", "Next sentence."]),
|
||||||
("I never meant that.... She left the store.", ["I never meant that....", "She left the store."]),
|
("I never meant that.... She left the store.", ["I never meant that....", "She left the store."]),
|
||||||
pytest.mark.xfail(("I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it.", ["I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it."])),
|
pytest.param("I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it.", ["I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it."], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds. . . . The practice was not abandoned. . . .", ["One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds.", ". . . The practice was not abandoned. . . ."])),
|
pytest.param("One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds. . . . The practice was not abandoned. . . .", ["One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds.", ". . . The practice was not abandoned. . . ."], marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail(("Hello world.Today is Tuesday.Mr. Smith went to the store and bought 1,000.That is a lot.", ["Hello world.", "Today is Tuesday.", "Mr. Smith went to the store and bought 1,000.", "That is a lot."]))
|
pytest.param("Hello world.Today is Tuesday.Mr. Smith went to the store and bought 1,000.That is a lot.", ["Hello world.", "Today is Tuesday.", "Mr. Smith went to the store and bought 1,000.", "That is a lot."], marks=pytest.mark.xfail())
|
||||||
]
|
]
|
||||||
|
|
||||||
@pytest.mark.skip
|
@pytest.mark.skip
|
||||||
|
|
|
@ -29,7 +29,7 @@ untimely death" of the rapier-tongued Scottish barrister and parliamentarian.
|
||||||
("""Yes! "I'd rather have a walk", Ms. Comble sighed. """, 15),
|
("""Yes! "I'd rather have a walk", Ms. Comble sighed. """, 15),
|
||||||
("""'Me too!', Mr. P. Delaware cried. """, 11),
|
("""'Me too!', Mr. P. Delaware cried. """, 11),
|
||||||
("They ran about 10km.", 6),
|
("They ran about 10km.", 6),
|
||||||
pytest.mark.xfail(("But then the 6,000-year ice age came...", 10))])
|
pytest.param("But then the 6,000-year ice age came...", 10, marks=pytest.mark.xfail())])
|
||||||
def test_en_tokenizer_handles_cnts(en_tokenizer, text, length):
|
def test_en_tokenizer_handles_cnts(en_tokenizer, text, length):
|
||||||
tokens = en_tokenizer(text)
|
tokens = en_tokenizer(text)
|
||||||
assert len(tokens) == length
|
assert len(tokens) == length
|
||||||
|
|
|
@ -11,7 +11,7 @@ def fr_tokenizer():
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize('text', ["aujourd'hui", "Aujourd'hui", "prud'hommes",
|
@pytest.mark.parametrize('text', ["aujourd'hui", "Aujourd'hui", "prud'hommes",
|
||||||
"prud’hommal"])
|
"prud’hommal", "entr'amis"])
|
||||||
def test_tokenizer_infix_exceptions(fr_tokenizer, text):
|
def test_tokenizer_infix_exceptions(fr_tokenizer, text):
|
||||||
tokens = fr_tokenizer(text)
|
tokens = fr_tokenizer(text)
|
||||||
assert len(tokens) == 1
|
assert len(tokens) == 1
|
||||||
|
|
|
@ -5,11 +5,11 @@ import pytest
|
||||||
|
|
||||||
DEFAULT_TESTS = [
|
DEFAULT_TESTS = [
|
||||||
('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']),
|
('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']),
|
||||||
pytest.mark.xfail(('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'])),
|
pytest.param('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'], marks=pytest.mark.xfail()),
|
||||||
('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']),
|
('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']),
|
||||||
('A pl. rovidites.', ['A', 'pl.', 'rovidites', '.']),
|
('A pl. rovidites.', ['A', 'pl.', 'rovidites', '.']),
|
||||||
('A S.M.A.R.T. szo.', ['A', 'S.M.A.R.T.', 'szo', '.']),
|
('A S.M.A.R.T. szo.', ['A', 'S.M.A.R.T.', 'szo', '.']),
|
||||||
pytest.mark.xfail(('A .hu.', ['A', '.hu', '.'])),
|
pytest.param('A .hu.', ['A', '.hu', '.'], marks=pytest.mark.xfail()),
|
||||||
('Az egy.ketto.', ['Az', 'egy.ketto', '.']),
|
('Az egy.ketto.', ['Az', 'egy.ketto', '.']),
|
||||||
('A pl.', ['A', 'pl.']),
|
('A pl.', ['A', 'pl.']),
|
||||||
('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']),
|
('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']),
|
||||||
|
@ -227,11 +227,11 @@ QUOTE_TESTS = [
|
||||||
|
|
||||||
DOT_TESTS = [
|
DOT_TESTS = [
|
||||||
('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']),
|
('N. kormányzósági\nszékhely.', ['N.', 'kormányzósági', 'székhely', '.']),
|
||||||
pytest.mark.xfail(('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'])),
|
pytest.param('A .hu egy tld.', ['A', '.hu', 'egy', 'tld', '.'], marks=pytest.mark.xfail()),
|
||||||
('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']),
|
('Az egy.ketto pelda.', ['Az', 'egy.ketto', 'pelda', '.']),
|
||||||
('A pl. rövidítés.', ['A', 'pl.', 'rövidítés', '.']),
|
('A pl. rövidítés.', ['A', 'pl.', 'rövidítés', '.']),
|
||||||
('A S.M.A.R.T. szó.', ['A', 'S.M.A.R.T.', 'szó', '.']),
|
('A S.M.A.R.T. szó.', ['A', 'S.M.A.R.T.', 'szó', '.']),
|
||||||
pytest.mark.xfail(('A .hu.', ['A', '.hu', '.'])),
|
pytest.param('A .hu.', ['A', '.hu', '.'], marks=pytest.mark.xfail()),
|
||||||
('Az egy.ketto.', ['Az', 'egy.ketto', '.']),
|
('Az egy.ketto.', ['Az', 'egy.ketto', '.']),
|
||||||
('A pl.', ['A', 'pl.']),
|
('A pl.', ['A', 'pl.']),
|
||||||
('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']),
|
('A S.M.A.R.T.', ['A', 'S.M.A.R.T.']),
|
||||||
|
|
|
@ -7,7 +7,6 @@ import pytest
|
||||||
from ...cli.train import train
|
from ...cli.train import train
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
def test_cli_trained_model_can_be_saved(tmpdir):
|
def test_cli_trained_model_can_be_saved(tmpdir):
|
||||||
lang = 'nl'
|
lang = 'nl'
|
||||||
output_dir = str(tmpdir)
|
output_dir = str(tmpdir)
|
|
@ -7,7 +7,6 @@ from ...vocab import Vocab
|
||||||
from ...tokens import Doc, Span
|
from ...tokens import Doc, Span
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
def test_issue1547():
|
def test_issue1547():
|
||||||
"""Test that entity labels still match after merging tokens."""
|
"""Test that entity labels still match after merging tokens."""
|
||||||
words = ['\n', 'worda', '.', '\n', 'wordb', '-', 'Biosphere', '2', '-', ' \n']
|
words = ['\n', 'worda', '.', '\n', 'wordb', '-', 'Biosphere', '2', '-', ' \n']
|
||||||
|
|
|
@ -6,7 +6,7 @@ from ...vocab import Vocab
|
||||||
from ...tokens import Doc
|
from ...tokens import Doc
|
||||||
from ...matcher import Matcher
|
from ...matcher import Matcher
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
def test_issue1945():
|
def test_issue1945():
|
||||||
text = "a a a"
|
text = "a a a"
|
||||||
matcher = Matcher(Vocab())
|
matcher = Matcher(Vocab())
|
||||||
|
|
|
@ -4,7 +4,6 @@ import pytest
|
||||||
from ...gold import iob_to_biluo
|
from ...gold import iob_to_biluo
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
@pytest.mark.parametrize('tags', [('B-ORG', 'L-ORG'),
|
@pytest.mark.parametrize('tags', [('B-ORG', 'L-ORG'),
|
||||||
('B-PERSON', 'I-PERSON', 'L-PERSON'),
|
('B-PERSON', 'I-PERSON', 'L-PERSON'),
|
||||||
('U-BRAWLER', 'U-BRAWLER')])
|
('U-BRAWLER', 'U-BRAWLER')])
|
||||||
|
@ -13,21 +12,18 @@ def test_issue2385_biluo(tags):
|
||||||
assert iob_to_biluo(tags) == list(tags)
|
assert iob_to_biluo(tags) == list(tags)
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
@pytest.mark.parametrize('tags', [('B-BRAWLER', 'I-BRAWLER', 'I-BRAWLER')])
|
@pytest.mark.parametrize('tags', [('B-BRAWLER', 'I-BRAWLER', 'I-BRAWLER')])
|
||||||
def test_issue2385_iob_bcharacter(tags):
|
def test_issue2385_iob_bcharacter(tags):
|
||||||
"""fix bug in labels with a 'b' character"""
|
"""fix bug in labels with a 'b' character"""
|
||||||
assert iob_to_biluo(tags) == ['B-BRAWLER', 'I-BRAWLER', 'L-BRAWLER']
|
assert iob_to_biluo(tags) == ['B-BRAWLER', 'I-BRAWLER', 'L-BRAWLER']
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
@pytest.mark.parametrize('tags', [('I-ORG', 'I-ORG', 'B-ORG')])
|
@pytest.mark.parametrize('tags', [('I-ORG', 'I-ORG', 'B-ORG')])
|
||||||
def test_issue2385_iob1(tags):
|
def test_issue2385_iob1(tags):
|
||||||
"""maintain support for iob1 format"""
|
"""maintain support for iob1 format"""
|
||||||
assert iob_to_biluo(tags) == ['B-ORG', 'L-ORG', 'U-ORG']
|
assert iob_to_biluo(tags) == ['B-ORG', 'L-ORG', 'U-ORG']
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail
|
|
||||||
@pytest.mark.parametrize('tags', [('B-PERSON', 'I-PERSON', 'B-PERSON')])
|
@pytest.mark.parametrize('tags', [('B-PERSON', 'I-PERSON', 'B-PERSON')])
|
||||||
def test_issue2385_iob2(tags):
|
def test_issue2385_iob2(tags):
|
||||||
"""maintain support for iob2 format"""
|
"""maintain support for iob2 format"""
|
||||||
|
|
|
@ -47,16 +47,16 @@ URLS_SHOULD_MATCH = [
|
||||||
"http://223.255.255.254",
|
"http://223.255.255.254",
|
||||||
"http://a.b--c.de/", # this is a legit domain name see: https://gist.github.com/dperini/729294 comment on 9/9/2014
|
"http://a.b--c.de/", # this is a legit domain name see: https://gist.github.com/dperini/729294 comment on 9/9/2014
|
||||||
|
|
||||||
pytest.mark.xfail("http://foo.com/blah_blah_(wikipedia)"),
|
pytest.param("http://foo.com/blah_blah_(wikipedia)", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://foo.com/blah_blah_(wikipedia)_(again)"),
|
pytest.param("http://foo.com/blah_blah_(wikipedia)_(again)", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://⌘.ws"),
|
pytest.param("http://⌘.ws", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://⌘.ws/"),
|
pytest.param("http://⌘.ws/", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://☺.damowmow.com/"),
|
pytest.param("http://☺.damowmow.com/", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://✪df.ws/123"),
|
pytest.param("http://✪df.ws/123", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://➡.ws/䨹"),
|
pytest.param("http://➡.ws/䨹", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://مثال.إختبار"),
|
pytest.param("http://مثال.إختبار", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://例子.测试"),
|
pytest.param("http://例子.测试", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://उदाहरण.परीक्षा"),
|
pytest.param("http://उदाहरण.परीक्षा", marks=pytest.mark.xfail()),
|
||||||
]
|
]
|
||||||
|
|
||||||
URLS_SHOULD_NOT_MATCH = [
|
URLS_SHOULD_NOT_MATCH = [
|
||||||
|
@ -95,10 +95,10 @@ URLS_SHOULD_NOT_MATCH = [
|
||||||
"http://10.1.1.1",
|
"http://10.1.1.1",
|
||||||
"NASDAQ:GOOG",
|
"NASDAQ:GOOG",
|
||||||
|
|
||||||
pytest.mark.xfail("foo.com"),
|
pytest.param("foo.com", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://1.1.1.1.1"),
|
pytest.param("http://1.1.1.1.1", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://www.foo.bar./"),
|
pytest.param("http://www.foo.bar./", marks=pytest.mark.xfail()),
|
||||||
pytest.mark.xfail("http://-a.b.co"),
|
pytest.param("http://-a.b.co", marks=pytest.mark.xfail()),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -297,7 +297,7 @@ cdef class Vocab:
|
||||||
|
|
||||||
self.vectors = Vectors(data=keep, keys=keys)
|
self.vectors = Vectors(data=keep, keys=keys)
|
||||||
|
|
||||||
syn_keys, syn_rows, scores = self.vectors.most_similar(toss)
|
syn_keys, syn_rows, scores = self.vectors.most_similar(toss, batch_size=batch_size)
|
||||||
|
|
||||||
remap = {}
|
remap = {}
|
||||||
for i, key in enumerate(keys[nr_row:]):
|
for i, key in enumerate(keys[nr_row:]):
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
p
|
p
|
||||||
| Models trained on the
|
| Models trained on the
|
||||||
| #[+a("https://catalog.ldc.upenn.edu/ldc2013t19") OntoNotes 5] corpus
|
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus
|
||||||
| support the following entity types:
|
| support the following entity types:
|
||||||
|
|
||||||
+table(["Type", "Description"])
|
+table(["Type", "Description"])
|
||||||
|
|
|
@ -352,6 +352,7 @@ p Retokenize the document, such that the span is merged into a single token.
|
||||||
+h(2, "ents") Span.ents
|
+h(2, "ents") Span.ents
|
||||||
+tag property
|
+tag property
|
||||||
+tag-model("NER")
|
+tag-model("NER")
|
||||||
|
+tag-new("2.0.12")
|
||||||
|
|
||||||
p
|
p
|
||||||
| Iterate over the entities in the span. Yields named-entity
|
| Iterate over the entities in the span. Yields named-entity
|
||||||
|
|
|
@ -714,7 +714,7 @@ p The L2 norm of the token's vector representation.
|
||||||
+cell bool
|
+cell bool
|
||||||
+cell
|
+cell
|
||||||
| Does the token consist of ASCII characters? Equivalent to
|
| Does the token consist of ASCII characters? Equivalent to
|
||||||
| #[code [any(ord(c) >= 128 for c in token.text)]].
|
| #[code all(ord(c) < 128 for c in token.text)].
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code is_digit]
|
+cell #[code is_digit]
|
||||||
|
|
|
@ -31,8 +31,8 @@ p
|
||||||
nlp = spacy.blank('fi') # blank instance
|
nlp = spacy.blank('fi') # blank instance
|
||||||
|
|
||||||
+table(["Language", "Code", "Language data"])
|
+table(["Language", "Code", "Language data"])
|
||||||
for lang, code in LANGUAGES
|
- var sorted_langs = Object.assign({}, ...Object.keys(LANGUAGES).filter(key => !MODELS[key]).sort().map(key => ({ [key]: LANGUAGES[key] })))
|
||||||
if !Object.keys(MODELS).includes(code)
|
for lang, code in sorted_langs
|
||||||
+row
|
+row
|
||||||
+cell #{LANGUAGES[code]}
|
+cell #{LANGUAGES[code]}
|
||||||
+cell #[code=code]
|
+cell #[code=code]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user