From 874cd025395b9bbcfb4ab5991fdf24cc99fd95e1 Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Mon, 26 Apr 2021 17:06:32 +0200 Subject: [PATCH 001/203] Set spacy-legacy to >=3.0.5 (#7897) Set `spacy-legacy` to `>=3.0.5` due to `spacy.StaticVectors.v1` init bug. --- requirements.txt | 2 +- setup.cfg | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index 1947dd2de..a8a15a01b 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,5 @@ # Our libraries -spacy-legacy>=3.0.4,<3.1.0 +spacy-legacy>=3.0.5,<3.1.0 cymem>=2.0.2,<2.1.0 preshed>=3.0.2,<3.1.0 thinc>=8.0.3,<8.1.0 diff --git a/setup.cfg b/setup.cfg index 9e1293335..2fedd8f5c 100644 --- a/setup.cfg +++ b/setup.cfg @@ -37,7 +37,7 @@ setup_requires = thinc>=8.0.3,<8.1.0 install_requires = # Our libraries - spacy-legacy>=3.0.4,<3.1.0 + spacy-legacy>=3.0.5,<3.1.0 murmurhash>=0.28.0,<1.1.0 cymem>=2.0.2,<2.1.0 preshed>=3.0.2,<3.1.0 From 1690595e4d243378dd13542090c658429fd87d15 Mon Sep 17 00:00:00 2001 From: Janis Klaise Date: Tue, 27 Apr 2021 08:13:39 +0100 Subject: [PATCH 002/203] Update load_lookups return type and docstring (#7907) * Update load_lookups return type and docstring * Add contributor agreement --- .github/contributors/jklaise.md | 106 ++++++++++++++++++++++++++++++++ spacy/lookups.py | 8 +-- 2 files changed, 110 insertions(+), 4 deletions(-) create mode 100644 .github/contributors/jklaise.md diff --git a/.github/contributors/jklaise.md b/.github/contributors/jklaise.md new file mode 100644 index 000000000..66d77ee48 --- /dev/null +++ b/.github/contributors/jklaise.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name |Janis Klaise | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date |26/04/2021 | +| GitHub username |jklaise | +| Website (optional) |janisklaise.com | diff --git a/spacy/lookups.py b/spacy/lookups.py index 76535d1de..f635f0dcf 100644 --- a/spacy/lookups.py +++ b/spacy/lookups.py @@ -1,4 +1,4 @@ -from typing import Dict, Any, List, Union, Optional +from typing import Any, List, Union, Optional from pathlib import Path import srsly from preshed.bloom import BloomFilter @@ -14,16 +14,16 @@ UNSET = object() def load_lookups( lang: str, tables: List[str], strict: bool = True -) -> Optional[Dict[str, Any]]: +) -> 'Lookups': """Load the data from the spacy-lookups-data package for a given language, - if available. Returns an empty dict if there's no data or if the package + if available. Returns an empty `Lookups` container if there's no data or if the package is not installed. lang (str): The language code (corresponds to entry point exposed by the spacy-lookups-data package). tables (List[str]): Name of tables to load, e.g. ["lemma_lookup", "lemma_exc"] strict (bool): Whether to raise an error if a table doesn't exist. - RETURNS (Dict[str, Any]): The lookups, keyed by table name. + RETURNS (Lookups): The lookups container containing the loaded tables. """ # TODO: import spacy_lookups_data instead of going via entry points here? lookups = Lookups() From de6b5ed14dcb036c02e92664365ea2b1fb6cf21c Mon Sep 17 00:00:00 2001 From: Paul O'Leary McCann Date: Tue, 27 Apr 2021 16:16:35 +0900 Subject: [PATCH 003/203] Fix percent unk display in debug data (#7886) * Fix percent unk display This was showing (ratio %), so 10% would show as 0.10%. Fix by multiplying ration by 100. Might want to add a warning if this is over a threshold. * Only show whole-integer percents --- spacy/cli/debug_data.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spacy/cli/debug_data.py b/spacy/cli/debug_data.py index 3351e53fe..1ebf65957 100644 --- a/spacy/cli/debug_data.py +++ b/spacy/cli/debug_data.py @@ -173,8 +173,8 @@ def debug_data( ) n_missing_vectors = sum(gold_train_data["words_missing_vectors"].values()) msg.warn( - "{} words in training data without vectors ({:0.2f}%)".format( - n_missing_vectors, n_missing_vectors / gold_train_data["n_words"] + "{} words in training data without vectors ({:.0f}%)".format( + n_missing_vectors, 100 * (n_missing_vectors / gold_train_data["n_words"]) ), ) msg.text( From 8007d5c8148460d08a6aa500dff0eabb0f504f23 Mon Sep 17 00:00:00 2001 From: Paul O'Leary McCann Date: Wed, 28 Apr 2021 16:17:15 +0900 Subject: [PATCH 004/203] Check if the resume path points to a directory (#7919) This came up in #7878, but if --resume-path is a directory then loading the weights will fail. On Linux this will give a straightforward error message, but on Windows it gives "Permission Denied", which is confusing. --- spacy/cli/pretrain.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/spacy/cli/pretrain.py b/spacy/cli/pretrain.py index 1f8fc99cc..fe3ce0dad 100644 --- a/spacy/cli/pretrain.py +++ b/spacy/cli/pretrain.py @@ -95,6 +95,13 @@ def verify_cli_args(config_path, output_dir, resume_path, epoch_resume): "then the new directory will be created for you.", ) if resume_path is not None: + if resume_path.is_dir(): + # This is necessary because Windows gives a Permission Denied when we + # try to open the directory later, which is confusing. See #7878 + msg.fail( + "--resume-path should be a weights file, but {resume_path} is a directory.", + exits=True, + ) model_name = re.search(r"model\d+\.bin", str(resume_path)) if not model_name and not epoch_resume: msg.fail( From f4080983eab96a1c43a98d2553bc2a2cdea3986d Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Wed, 28 Apr 2021 10:18:24 +0200 Subject: [PATCH 005/203] Extend to cupy 9.0.0 (#7914) --- .github/azure-steps.yml | 2 +- setup.cfg | 22 +++++++++++----------- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/.github/azure-steps.yml b/.github/azure-steps.yml index 750e096d0..d536f2eb8 100644 --- a/.github/azure-steps.yml +++ b/.github/azure-steps.yml @@ -41,7 +41,7 @@ steps: displayName: "Install test requirements" - script: | - ${{ parameters.prefix }} python -m pip install -U cupy-cuda110 + ${{ parameters.prefix }} python -m pip install -U cupy-cuda110 -f https://github.com/cupy/cupy/releases/v9.0.0 ${{ parameters.prefix }} python -m pip install "torch==1.7.1+cu110" -f https://download.pytorch.org/whl/torch_stable.html displayName: "Install GPU requirements" condition: eq(${{ parameters.gpu }}, true) diff --git a/setup.cfg b/setup.cfg index 2fedd8f5c..63d603a9c 100644 --- a/setup.cfg +++ b/setup.cfg @@ -71,27 +71,27 @@ transformers = ray = spacy_ray>=0.1.0,<1.0.0 cuda = - cupy>=5.0.0b4,<9.0.0 + cupy>=5.0.0b4,<10.0.0 cuda80 = - cupy-cuda80>=5.0.0b4,<9.0.0 + cupy-cuda80>=5.0.0b4,<10.0.0 cuda90 = - cupy-cuda90>=5.0.0b4,<9.0.0 + cupy-cuda90>=5.0.0b4,<10.0.0 cuda91 = - cupy-cuda91>=5.0.0b4,<9.0.0 + cupy-cuda91>=5.0.0b4,<10.0.0 cuda92 = - cupy-cuda92>=5.0.0b4,<9.0.0 + cupy-cuda92>=5.0.0b4,<10.0.0 cuda100 = - cupy-cuda100>=5.0.0b4,<9.0.0 + cupy-cuda100>=5.0.0b4,<10.0.0 cuda101 = - cupy-cuda101>=5.0.0b4,<9.0.0 + cupy-cuda101>=5.0.0b4,<10.0.0 cuda102 = - cupy-cuda102>=5.0.0b4,<9.0.0 + cupy-cuda102>=5.0.0b4,<10.0.0 cuda110 = - cupy-cuda110>=5.0.0b4,<9.0.0 + cupy-cuda110>=5.0.0b4,<10.0.0 cuda111 = - cupy-cuda111>=5.0.0b4,<9.0.0 + cupy-cuda111>=5.0.0b4,<10.0.0 cuda112 = - cupy-cuda112>=5.0.0b4,<9.0.0 + cupy-cuda112>=5.0.0b4,<10.0.0 # Language tokenizers with external dependencies ja = sudachipy>=0.4.9 From 49aed683cce4d58baca10e7cb4fe89fbfc209a36 Mon Sep 17 00:00:00 2001 From: Sevdimali Date: Wed, 28 Apr 2021 16:42:02 +0400 Subject: [PATCH 006/203] Azerbaijani language added (#7911) --- .github/contributors/sevdimali.md | 106 ++++++++++++++++++++++ spacy/lang/az/__init__.py | 21 +++++ spacy/lang/az/examples.py | 18 ++++ spacy/lang/az/lex_attrs.py | 89 ++++++++++++++++++ spacy/lang/az/stop_words.py | 145 ++++++++++++++++++++++++++++++ 5 files changed, 379 insertions(+) create mode 100644 .github/contributors/sevdimali.md create mode 100644 spacy/lang/az/__init__.py create mode 100644 spacy/lang/az/examples.py create mode 100644 spacy/lang/az/lex_attrs.py create mode 100644 spacy/lang/az/stop_words.py diff --git a/.github/contributors/sevdimali.md b/.github/contributors/sevdimali.md new file mode 100644 index 000000000..6b96abdf8 --- /dev/null +++ b/.github/contributors/sevdimali.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Sevdimali | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 10/4/2021 | +| GitHub username | sevdimali | +| Website (optional) | https://sevdimali.me | diff --git a/spacy/lang/az/__init__.py b/spacy/lang/az/__init__.py new file mode 100644 index 000000000..6a4288d1e --- /dev/null +++ b/spacy/lang/az/__init__.py @@ -0,0 +1,21 @@ +from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS, TOKEN_MATCH +from .stop_words import STOP_WORDS +from .syntax_iterators import SYNTAX_ITERATORS +from .lex_attrs import LEX_ATTRS +from ...language import Language + + +class AzerbaijaniDefaults(Language.Defaults): + tokenizer_exceptions = TOKENIZER_EXCEPTIONS + lex_attr_getters = LEX_ATTRS + stop_words = STOP_WORDS + token_match = TOKEN_MATCH + syntax_iterators = SYNTAX_ITERATORS + + +class Azerbaijani(Language): + lang = "az" + Defaults = AzerbaijaniDefaults + + +__all__ = ["Azerbaijani"] diff --git a/spacy/lang/az/examples.py b/spacy/lang/az/examples.py new file mode 100644 index 000000000..f3331a8cb --- /dev/null +++ b/spacy/lang/az/examples.py @@ -0,0 +1,18 @@ +""" +Example sentences to test spaCy and its language models. +>>> from spacy.lang.az.examples import sentences +>>> docs = nlp.pipe(sentences) +""" + + +sentences = [ + "Bu bir cümlədir.", + "Necəsən?", + "Qarabağ ordeni vətən müharibəsində qələbə münasibəti ilə təsis edilmişdir.", + "Məktəbimizə Bakıdan bir tarix müəllimi gəlmişdi.", + "Atılan növbəti mərmilər lap yaxınlıqda partladı.", + "Sinqapur koronavirus baxımından ən təhlükəsiz ölkələr sırasındadır.", + "Marsda ilk sınaq uçuşu həyata keçirilib.", + "SSRİ dağılandan bəri 5 sahil dövləti Xəzərin statusunu müəyyən edə bilməyiblər.", + "Videoda beyninə xüsusi çip yerləşdirilmiş meymun əks olunub.", +] diff --git a/spacy/lang/az/lex_attrs.py b/spacy/lang/az/lex_attrs.py new file mode 100644 index 000000000..73a5e2762 --- /dev/null +++ b/spacy/lang/az/lex_attrs.py @@ -0,0 +1,89 @@ +from ...attrs import LIKE_NUM + + +# Eleven, twelve etc. are written separate: on bir, on iki + +_num_words = [ + "bir", + "iki", + "üç", + "dörd", + "beş", + "altı", + "yeddi", + "səkkiz", + "doqquz", + "on", + "iyirmi", + "otuz", + "qırx", + "əlli", + "altmış", + "yetmiş", + "səksən", + "doxsan", + "yüz", + "min", + "milyon", + "milyard", + "trilyon", + "kvadrilyon", + "kentilyon", +] + + +_ordinal_words = [ + "birinci", + "ikinci", + "üçüncü", + "dördüncü", + "beşinci", + "altıncı", + "yedinci", + "səkkizinci", + "doqquzuncu", + "onuncu", + "iyirminci", + "otuzuncu", + "qırxıncı", + "əllinci", + "altmışıncı", + "yetmişinci", + "səksəninci", + "doxsanıncı", + "yüzüncü", + "mininci", + "milyonuncu", + "milyardıncı", + "trilyonuncu", + "kvadrilyonuncu", + "kentilyonuncu", +] + +_ordinal_endings = ("inci", "ıncı", "nci", "ncı", "uncu", "üncü") + + +def like_num(text): + if text.startswith(("+", "-", "±", "~")): + text = text[1:] + text = text.replace(",", "").replace(".", "") + if text.isdigit(): + return True + if text.count("/") == 1: + num, denom = text.split("/") + if num.isdigit() and denom.isdigit(): + return True + text_lower = text.lower() + # Check cardinal number + if text_lower in _num_words: + return True + # Check ordinal number + if text_lower in _ordinal_words: + return True + if text_lower.endswith(_ordinal_endings): + if text_lower[:-3].isdigit() or text_lower[:-4].isdigit(): + return True + return False + + +LEX_ATTRS = {LIKE_NUM: like_num} diff --git a/spacy/lang/az/stop_words.py b/spacy/lang/az/stop_words.py new file mode 100644 index 000000000..2114939ba --- /dev/null +++ b/spacy/lang/az/stop_words.py @@ -0,0 +1,145 @@ +# Source: https://github.com/eliasdabbas/advertools/blob/master/advertools/stopwords.py +STOP_WORDS = set( + """ +amma +arasında +artıq +ay +az +bax +belə +beş +bilər +bir +biraz +biri +birşey +biz +bizim +bizlər +bu +buna +bundan +bunların +bunu +bunun +buradan +bütün +bəli +bəlkə +bəy +bəzi +bəzən +daha +dedi +deyil +dir +düz +də +dək +dən +dəqiqə +edir +edən +elə +et +etdi +etmə +etmək +faiz +gilə +görə +ha +haqqında +harada +heç +hə +həm +həmin +həmişə +hər +idi +il +ildə +ilk +ilə +in +indi +istifadə +isə +ki +kim +kimi +kimə +lakin +lap +mirşey +məhz +mən +mənə +niyə +nə +nəhayət +o +obirisi +of +olan +olar +olaraq +oldu +olduğu +olmadı +olmaz +olmuşdur +olsun +olur +on +ona +ondan +onlar +onlardan +onların +onsuzda +onu +onun +oradan +qarşı +qədər +saat +sadəcə +saniyə +siz +sizin +sizlər +sonra +səhv +sən +sənin +sənə +təəssüf +var +və +xan +xanım +xeyr +ya +yalnız +yaxşı +yeddi +yenə +yox +yoxdur +yoxsa +yəni +zaman +çox +çünki +öz +özü +üçün +əgər +əlbəttə +ən +əslində +""".split() +) From 7cf5bd072fc1ca65be2a9eb3115aa838ba83b04d Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Thu, 29 Apr 2021 16:58:54 +0200 Subject: [PATCH 007/203] Refactor util.to_ternary_int (#7944) * Refactor to avoid literal comparison with `is` * Extend tests --- spacy/tests/test_misc.py | 16 ++++++++++++++++ spacy/util.py | 12 ++++++++---- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/spacy/tests/test_misc.py b/spacy/tests/test_misc.py index 0d09999a9..b38a50f71 100644 --- a/spacy/tests/test_misc.py +++ b/spacy/tests/test_misc.py @@ -8,6 +8,7 @@ from spacy import prefer_gpu, require_gpu, require_cpu from spacy.ml._precomputable_affine import PrecomputableAffine from spacy.ml._precomputable_affine import _backprop_precomputable_affine_padding from spacy.util import dot_to_object, SimpleFrozenList, import_file +from spacy.util import to_ternary_int from thinc.api import Config, Optimizer, ConfigValidationError, get_current_ops from thinc.api import set_current_ops from spacy.training.batchers import minibatch_by_words @@ -386,3 +387,18 @@ def make_dummy_component( nlp = English.from_config(config) nlp.add_pipe("dummy_component") nlp.initialize() + + +def test_to_ternary_int(): + assert to_ternary_int(True) == 1 + assert to_ternary_int(None) == 0 + assert to_ternary_int(False) == -1 + assert to_ternary_int(1) == 1 + assert to_ternary_int(1.0) == 1 + assert to_ternary_int(0) == 0 + assert to_ternary_int(0.0) == 0 + assert to_ternary_int(-1) == -1 + assert to_ternary_int(5) == -1 + assert to_ternary_int(-10) == -1 + assert to_ternary_int("string") == -1 + assert to_ternary_int([0, "string"]) == -1 diff --git a/spacy/util.py b/spacy/util.py index 512c6b742..84142d5d8 100644 --- a/spacy/util.py +++ b/spacy/util.py @@ -1533,11 +1533,15 @@ def to_ternary_int(val) -> int: attributes such as SENT_START: True/1/1.0 is 1 (True), None/0/0.0 is 0 (None), any other values are -1 (False). """ - if isinstance(val, float): - val = int(val) - if val is True or val is 1: + if val is True: return 1 - elif val is None or val is 0: + elif val is None: + return 0 + elif val is False: + return -1 + elif val == 1: + return 1 + elif val == 0: return 0 else: return -1 From cf032ec31e38f57940edfb93f041bcd373871554 Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Thu, 29 Apr 2021 19:11:28 +0200 Subject: [PATCH 008/203] Update to catalogue>=2.0.4 (#7951) --- requirements.txt | 2 +- setup.cfg | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index a8a15a01b..09d1cabda 100644 --- a/requirements.txt +++ b/requirements.txt @@ -8,7 +8,7 @@ ml_datasets>=0.2.0,<0.3.0 murmurhash>=0.28.0,<1.1.0 wasabi>=0.8.1,<1.1.0 srsly>=2.4.1,<3.0.0 -catalogue>=2.0.3,<2.1.0 +catalogue>=2.0.4,<2.1.0 typer>=0.3.0,<0.4.0 pathy>=0.3.5 # Third party dependencies diff --git a/setup.cfg b/setup.cfg index 63d603a9c..5cda00fb2 100644 --- a/setup.cfg +++ b/setup.cfg @@ -45,7 +45,7 @@ install_requires = blis>=0.4.0,<0.8.0 wasabi>=0.8.1,<1.1.0 srsly>=2.4.1,<3.0.0 - catalogue>=2.0.3,<2.1.0 + catalogue>=2.0.4,<2.1.0 typer>=0.3.0,<0.4.0 pathy>=0.3.5 # Third-party dependencies From 2320791f6dc42f7724cedc86a420572c90aa7a5c Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Fri, 30 Apr 2021 12:21:31 +0200 Subject: [PATCH 009/203] Fix Transformer.initialize example (#7963) --- website/docs/api/transformer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/api/transformer.md b/website/docs/api/transformer.md index 5aaa1d23e..6de2b0a87 100644 --- a/website/docs/api/transformer.md +++ b/website/docs/api/transformer.md @@ -175,7 +175,7 @@ by [`Language.initialize`](/api/language#initialize). > > ```python > trf = nlp.add_pipe("transformer") -> trf.initialize(lambda: [], nlp=nlp) +> trf.initialize(lambda: iter([]), nlp=nlp) > ``` | Name | Description | From 12d3d0feddc4f813d1cc63ab2465e31e9c8816cc Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Mon, 3 May 2021 11:48:12 +1000 Subject: [PATCH 010/203] Fix quickstart default checked of conditional fields [ci skip] --- website/src/components/quickstart.js | 3 ++- website/src/widgets/quickstart-training.js | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/website/src/components/quickstart.js b/website/src/components/quickstart.js index 90a8e0983..a32db8975 100644 --- a/website/src/components/quickstart.js +++ b/website/src/components/quickstart.js @@ -105,12 +105,13 @@ const Quickstart = ({ multiple, other, help, + hidden, }) => { // Optional function that's called with the value const setterFunc = setters[id] || (() => {}) // Check if dropdown should be shown const dropdownGetter = showDropdown[id] || (() => true) - return ( + return hidden ? null : (