Merge branch 'master' into spacy.io

2026-03-06 04:41:32 +03:00 · 2019-06-02 12:58:24 +02:00 · 2019-06-02 12:58:24 +02:00 · 596c7718b2
commit 596c7718b2
parent 101da344aa 892e72451f
5 changed files with 308 additions and 14 deletions
--- a/.github/contributors/munozbravo.md
+++ b/.github/contributors/munozbravo.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Germán Muñoz |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2019-06-01 |
+| GitHub username                | munozbravo |
+| Website (optional)             |                      |
--- a/spacy/lang/es/init.py
+++ b/spacy/lang/es/init.py
@ -4,6 +4,7 @@ from __future__ import unicode_literals
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
+from .lex_attrs import LEX_ATTRS
 from .lemmatizer import LOOKUP
 from .syntax_iterators import SYNTAX_ITERATORS

@ -16,6 +17,7 @@ from ...util import update_exc, add_lookups

 class SpanishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "es"
    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
--- a/spacy/lang/es/lex_attrs.py
+++ b/spacy/lang/es/lex_attrs.py
@ -0,0 +1,59 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ...attrs import LIKE_NUM
+
+
+_num_words = [
+    "cero",
+    "uno",
+    "dos",
+    "tres",
+    "cuatro",
+    "cinco",
+    "seis",
+    "siete",
+    "ocho",
+    "nueve",
+    "diez",
+    "once",
+    "doce",
+    "trece",
+    "catorce",
+    "quince",
+    "dieciséis",
+    "diecisiete",
+    "dieciocho",
+    "diecinueve",
+    "veinte",
+    "treinta",
+    "cuarenta",
+    "cincuenta",
+    "sesenta",
+    "setenta",
+    "ochenta",
+    "noventa",
+    "cien",
+    "mil",
+    "millón",
+    "billón",
+    "trillón",
+]
+
+
+def like_num(text):
+    if text.startswith(("+", "-", "±", "~")):
+        text = text[1:]
+    text = text.replace(",", "").replace(".", "")
+    if text.isdigit():
+        return True
+    if text.count("/") == 1:
+        num, denom = text.split("/")
+        if num.isdigit() and denom.isdigit():
+            return True
+    if text.lower() in _num_words:
+        return True
+    return False
+
+
+LEX_ATTRS = {LIKE_NUM: like_num}
--- a/spacy/tests/regression/test_issue3803.py
+++ b/spacy/tests/regression/test_issue3803.py
@ -0,0 +1,15 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+import pytest
+
+from spacy.lang.es import Spanish
+
+
+def test_issue3803():
+    """Test that spanish num-like tokens have True for like_num attribute."""
+    nlp = Spanish()
+    text = "2 dos 1000 mil 12 doce"
+    doc = nlp(text)
+
+    assert [t.like_num for t in doc] == [True, True, True, True, True, True]
--- a/website/meta/universe.json
+++ b/website/meta/universe.json
@ -1,22 +1,107 @@
 {
    "resources": [
+        {
+            "id": "nlp-architect",
+            "title": "NLP Architect",
+            "slogan": "Python lib for exploring Deep NLP & NLU by Intel AI",
+            "github": "NervanaSystems/nlp-architect",
+            "pip": "nlp-architect",
+            "thumb": "https://i.imgur.com/vMideRx.png",
+            "category": ["standalone", "research"],
+            "tags": ["pytorch"]
+        },
+        {
+            "id": "NeuroNER",
+            "title": "NeuroNER",
+            "slogan": "Named-entity recognition using neural networks",
+            "github": "Franck-Dernoncourt/NeuroNER",
+            "pip": "pyneuroner[cpu]",
+            "code_example": [
+                "from neuroner import neuromodel",
+                "nn = neuromodel.NeuroNER(train_model=False, use_pretrained_model=True)"
+            ],
+            "category": ["ner"],
+            "tags": ["standalone"]
+        },
+        {
+            "id": "NLPre",
+            "title": "NLPre",
+            "slogan": "Natural Language Preprocessing Library for health data and more",
+            "github": "NIHOPA/NLPre",
+            "pip": "nlpre",
+            "code_example": [
+                "from nlpre import titlecaps, dedash, identify_parenthetical_phrases",
+                "from nlpre import replace_acronyms, replace_from_dictionary",
+                "ABBR = identify_parenthetical_phrases()(text)",
+                "parsers = [dedash(), titlecaps(), replace_acronyms(ABBR),",
+                "        replace_from_dictionary(prefix='MeSH_')]",
+                "for f in parsers:",
+                "    text = f(text)",
+                "print(text)"
+            ],
+            "category": ["scientific"]
+        },
+        {
+            "id": "Chatterbot",
+            "title": "Chatterbot",
+            "slogan": "A machine-learning based conversational dialog engine for creating chat bots",
+            "github": "gunthercox/ChatterBot",
+            "pip": "chatterbot",
+            "thumb": "https://i.imgur.com/eyAhwXk.jpg",
+            "code_example": [
+                "from chatterbot import ChatBot",
+                "from chatterbot.trainers import ListTrainer",
+                "# Create a new chat bot named Charlie",
+                "chatbot = ChatBot('Charlie')",
+                "trainer = ListTrainer(chatbot)",
+                "trainer.train([",
+                "'Hi, can I help you?',",
+                "'Sure, I would like to book a flight to Iceland.",
+                "'Your flight has been booked.'",
+                "])",
+                "",
+                "response = chatbot.get_response('I would like to book a flight.')"
+            ],
+            "author": "Gunther Cox",
+            "author_links": {
+                "github": "gunthercox"
+            },
+            "category": ["conversational", "standalone"],
+            "tags": ["chatbots"]
+        },
        {
            "id": "saber",
            "title": "saber",
-            "slogan": "deep-learning based tool for information extraction in the biomedical domain",
+            "slogan": "Deep-learning based tool for information extraction in the biomedical domain",
            "github": "BaderLab/saber",
            "pip": "saber",
            "thumb": "https://raw.githubusercontent.com/BaderLab/saber/master/docs/img/saber_logo.png",
            "code_example": [
-                ">>> from saber.saber import Saber",
-                ">>> saber = Saber()",
-                ">>> saber.load('PRGE')",
+                "from saber.saber import Saber",
+                "saber = Saber()",
+                "saber.load('PRGE')",
                "saber.annotate('The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.')"
            ],
-            "category": ["research", "biomedical"],
-            "tags": ["keras"]
+            "author": "Bader Lab, University of Toronto",
+            "category": ["scientific"],
+            "tags": ["keras", "biomedical"]
+        },
+        {
+            "id": "alibi",
+            "title": "alibi",
+            "slogan": "Algorithms for monitoring and explaining machine learning models ",
+            "github": "SeldonIO/alibi",
+            "pip": "alibi",
+            "thumb": "https://i.imgur.com/YkzQHRp.png",
+            "code_example": [
+                "from alibi.explainers import AnchorTabular",
+                "explainer = AnchorTabular(predict_fn, feature_names)",
+                "explainer.fit(X_train)",
+                "explainer.explain(x)"
+            ],
+            "author": "Seldon",
+            "category": ["standalone", "research"]
        },
-
        {
            "id": "spacymoji",
            "slogan": "Emoji handling and meta data as a spaCy pipeline component",
@ -160,7 +245,7 @@
                "doc = nlp(my_doc_text)"
            ],
            "author": "tc64",
-            "author_link": {
+            "author_links": {
                "github": "tc64"
            },
            "category": ["pipeline"]
@ -363,7 +448,7 @@
            "author_links": {
                "github": "huggingface"
            },
-            "category": ["standalone", "conversational"],
+            "category": ["standalone", "conversational", "models"],
            "tags": ["coref"]
        },
        {
@ -555,7 +640,7 @@
                "twitter": "allenai_org",
                "website": "http://allenai.org"
            },
-            "category": ["models", "research"]
+            "category": ["scientific", "models", "research"]
        },
        {
            "id": "textacy",
@ -618,7 +703,7 @@
                "github": "ahalterman",
                "twitter": "ahalterman"
            },
-            "category": ["standalone"]
+            "category": ["standalone", "scientific"]
        },
        {
            "id": "kindred",
@ -643,7 +728,7 @@
            "author_links": {
                "github": "jakelever"
            },
-            "category": ["standalone"]
+            "category": ["standalone", "scientific"]
        },
        {
            "id": "sense2vec",
@ -911,6 +996,23 @@
            "author": "Aaron Kramer",
            "category": ["courses"]
        },
+        {
+            "type": "education",
+            "id": "spacy-course",
+            "title": "Advanced NLP with spaCy",
+            "slogan": "spaCy, 2019",
+            "description": "In this free interactive course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
+            "url": "https://course.spacy.io",
+            "image": "https://i.imgur.com/JC00pHW.jpg",
+            "thumb": "https://i.imgur.com/5RXLtrr.jpg",
+            "author": "Ines Montani",
+            "author_links": {
+                "twitter": "_inesmontani",
+                "github": "ines",
+                "website": "https://ines.io"
+            },
+            "category": ["courses"]
+        },
        {
            "type": "education",
            "id": "video-spacys-ner-model",
@ -1071,7 +1173,7 @@
                "github": "ecohealthalliance",
                "website": " https://ecohealthalliance.org/"
            },
-            "category": ["research", "standalone"]
+            "category": ["scientific", "standalone"]
        },
        {
            "id": "self-attentive-parser",
@ -1393,7 +1495,7 @@
            "url": "https://github.com/msg-systems/holmes-extractor",
            "description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural search, topic matching and supervised document classification.",
            "pip": "holmes-extractor",
-            "category": ["conversational", "research", "standalone"],
+            "category": ["conversational", "standalone"],
            "tags": ["chatbots", "text-processing"],
            "code_example": [
                "import holmes_extractor as holmes",
@ -1432,6 +1534,11 @@
                    "title": "Research",
                    "description": "Frameworks and utilities for developing better NLP models, especially using neural networks"
                },
+                {
+                    "id": "scientific",
+                    "title": "Scientific",
+                    "description": "Frameworks and utilities for scientific text processing"
+                },
                {
                    "id": "visualizers",
                    "title": "Visualizers",
@ -1451,6 +1558,11 @@
                    "id": "standalone",
                    "title": "Standalone",
                    "description": "Self-contained libraries or tools that use spaCy under the hood"
+                },
+                {
+                    "id": "models",
+                    "title": "Models",
+                    "description": "Third-party pre-trained models for different languages and domains"
                }
            ]
        },