Merge branch 'master' into spacy.io

2025-11-30 23:05:43 +03:00 · 2020-01-04 01:52:28 +01:00 · 2020-01-04 01:52:28 +01:00 · db81604d54
commit db81604d54
parent 554fbb04b0 400257a802
32 changed files with 976 additions and 123 deletions
--- a/.github/contributors/AlJohri.md
+++ b/.github/contributors/AlJohri.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Al Johri             |
+| Company name (if applicable)   | N/A                  |
+| Title or role (if applicable)  | N/A                  |
+| Date                           | December 27th, 2019  |
+| GitHub username                | AlJohri              |
+| Website (optional)             | http://aljohri.com/  |
--- a/.github/contributors/Olamyy.md
+++ b/.github/contributors/Olamyy.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ x ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |    Olamilekan Wahab  |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |    8/11/2019         |
+| GitHub username                |    Olamyy            |
+| Website (optional)             |                      |
--- a/.github/contributors/iechevarria.md
+++ b/.github/contributors/iechevarria.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                 |
+|------------------------------- | --------------------- |
+| Name                           | Ivan Echevarria       |
+| Company name (if applicable)   |                       |
+| Title or role (if applicable)  |                       |
+| Date                           | 2019-12-24            |
+| GitHub username                | iechevarria           |
+| Website (optional)             | https://echevarria.io |
--- a/.github/contributors/iurshina.md
+++ b/.github/contributors/iurshina.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Anastasiia Iurshina  |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 28.12.2019           |
+| GitHub username                | iurshina             |
+| Website (optional)             |                      |
--- a/bin/spacy
+++ b/bin/spacy
@ -1 +1,2 @@
+#! /bin/sh
 python -m spacy "$@"
--- a/spacy/_align.pyx
+++ b/spacy/_align.pyx
@ -30,7 +30,7 @@ S[:i]   -> T[:j]   (at D[i,j])
 S[:i+1] -> T[:j]   (at D[i+1,j])
 S[:i]   -> T[:j+1] (at D[i,j+1])
    
-Further, we now we can tranform:
+Further, now we can transform:
 S[:i+1] -> S[:i] (DEL) for 1,
 T[:j+1] -> T[:j] (INS) for 1.
 S[i+1]  -> T[j+1] (SUB) for 0 or 1
--- a/spacy/displacy/init.py
+++ b/spacy/displacy/init.py
@ -55,9 +55,10 @@ def render(
        html = RENDER_WRAPPER(html)
    if jupyter or (jupyter is None and is_in_jupyter()):
        # return HTML rendered by IPython display()
+        # See #4840 for details on span wrapper to disable mathjax
        from IPython.core.display import display, HTML

-        return display(HTML(html))
+        return display(HTML('<span class="tex2jax_ignore">{}</span>'.format(html)))
    return html


--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -78,10 +78,9 @@ class Warnings(object):
    W015 = ("As of v2.1.0, the use of keyword arguments to exclude fields from "
            "being serialized or deserialized is deprecated. Please use the "
            "`exclude` argument instead. For example: exclude=['{arg}'].")
-    W016 = ("The keyword argument `n_threads` on the is now deprecated, as "
-            "the v2.x models cannot release the global interpreter lock. "
-            "Future versions may introduce a `n_process` argument for "
-            "parallel inference via multiprocessing.")
+    W016 = ("The keyword argument `n_threads` is now deprecated. As of v2.2.2, "
+            "the argument `n_process` controls parallel inference via "
+            "multiprocessing.")
    W017 = ("Alias '{alias}' already exists in the Knowledge Base.")
    W018 = ("Entity '{entity}' already exists in the Knowledge Base - "
            "ignoring the duplicate entry.")
@ -105,6 +104,10 @@ class Warnings(object):
    W025 = ("'{name}' requires '{attr}' to be assigned, but none of the "
            "previous components in the pipeline declare that they assign it.")
    W026 = ("Unable to set all sentence boundaries from dependency parses.")
+    W027 = ("Found a large training file of {size} bytes. Note that it may "
+            "be more efficient to split your training data into multiple "
+            "smaller JSON files instead.")
+


@add_codes
--- a/spacy/gold.pyx
+++ b/spacy/gold.pyx
@ -13,7 +13,7 @@ import srsly

 from .syntax import nonproj
 from .tokens import Doc, Span
-from .errors import Errors, AlignmentError
+from .errors import Errors, AlignmentError, user_warning, Warnings
 from .compat import path2str
 from . import util
 from .util import minibatch, itershuffle
@ -557,12 +557,16 @@ def _json_iterate(loc):
    loc = util.ensure_path(loc)
    with loc.open("rb") as file_:
        py_raw = file_.read()
+    cdef long file_length = len(py_raw)
+    if file_length > 2 ** 30:
+        user_warning(Warnings.W027.format(size=file_length))
+
    raw = <char*>py_raw
    cdef int square_depth = 0
    cdef int curly_depth = 0
    cdef int inside_string = 0
    cdef int escape = 0
-    cdef int start = -1
+    cdef long start = -1
    cdef char c
    cdef char quote = ord('"')
    cdef char backslash = ord("\\")
@ -570,7 +574,7 @@ def _json_iterate(loc):
    cdef char close_square = ord("]")
    cdef char open_curly = ord("{")
    cdef char close_curly = ord("}")
-    for i in range(len(py_raw)):
+    for i in range(file_length):
        c = raw[i]
        if escape:
            escape = False
--- a/spacy/lang/el/tag_map.py
+++ b/spacy/lang/el/tag_map.py
@ -4249,20 +4249,20 @@ TAG_MAP = {
        "Voice": "Act",
        "Case": "Nom|Gen|Dat|Acc|Voc",
    },
-    'ADJ': {POS: ADJ},
-    'ADP': {POS: ADP},
-    'ADV': {POS: ADV},
-    'AtDf': {POS: DET},
-    'AUX': {POS: AUX},
-    'CCONJ': {POS: CCONJ},
-    'DET': {POS: DET},
-    'NOUN': {POS: NOUN},
-    'NUM': {POS: NUM},
-    'PART': {POS: PART},
-    'PRON': {POS: PRON},
-    'PROPN': {POS: PROPN},
-    'SCONJ': {POS: SCONJ},
-    'SYM': {POS: SYM},
-    'VERB': {POS: VERB},
-    'X': {POS: X},
+    "ADJ": {POS: ADJ},
+    "ADP": {POS: ADP},
+    "ADV": {POS: ADV},
+    "AtDf": {POS: DET},
+    "AUX": {POS: AUX},
+    "CCONJ": {POS: CCONJ},
+    "DET": {POS: DET},
+    "NOUN": {POS: NOUN},
+    "NUM": {POS: NUM},
+    "PART": {POS: PART},
+    "PRON": {POS: PRON},
+    "PROPN": {POS: PROPN},
+    "SCONJ": {POS: SCONJ},
+    "SYM": {POS: SYM},
+    "VERB": {POS: VERB},
+    "X": {POS: X},
 }
--- a/spacy/lang/ja/init.py
+++ b/spacy/lang/ja/init.py
@ -16,7 +16,8 @@ from ...util import DummyTokenizer
 # the flow by creating a dummy with the same interface.
 DummyNode = namedtuple("DummyNode", ["surface", "pos", "feature"])
 DummyNodeFeatures = namedtuple("DummyNodeFeatures", ["lemma"])
-DummySpace = DummyNode(' ', ' ', DummyNodeFeatures(' '))
+DummySpace = DummyNode(" ", " ", DummyNodeFeatures(" "))
+

 def try_fugashi_import():
    """Fugashi is required for Japanese support, so check for it.
@ -27,8 +28,7 @@ def try_fugashi_import():
        return fugashi
    except ImportError:
        raise ImportError(
-            "Japanese support requires Fugashi: "
-            "https://github.com/polm/fugashi"
+            "Japanese support requires Fugashi: " "https://github.com/polm/fugashi"
        )


@ -55,13 +55,14 @@ def resolve_pos(token):
        return token.pos + ",ADJ"
    return token.pos

+
 def get_words_and_spaces(tokenizer, text):
    """Get the individual tokens that make up the sentence and handle white space.

    Japanese doesn't usually use white space, and MeCab's handling of it for
    multiple spaces in a row is somewhat awkward.
    """
-    
+
    tokens = tokenizer.parseToNodeList(text)

    words = []
@ -76,6 +77,7 @@ def get_words_and_spaces(tokenizer, text):
        spaces.append(bool(token.white_space))
    return words, spaces

+
 class JapaneseTokenizer(DummyTokenizer):
    def __init__(self, cls, nlp=None):
        self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
--- a/spacy/lang/lb/punctuation.py
+++ b/spacy/lang/lb/punctuation.py
@ -1,8 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals

-from ..char_classes import LIST_ELLIPSES, LIST_ICONS
-from ..char_classes import CONCAT_QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
+from ..char_classes import LIST_ELLIPSES, LIST_ICONS, ALPHA, ALPHA_LOWER, ALPHA_UPPER

 ELISION = " ' ’ ".strip().replace(" ", "")

--- a/spacy/lang/lb/tokenizer_exceptions.py
+++ b/spacy/lang/lb/tokenizer_exceptions.py
@ -20,7 +20,7 @@ for exc_data in [
    {ORTH: "asw.", LEMMA: "an sou weider", NORM: "an sou weider"},
    {ORTH: "etc.", LEMMA: "et cetera", NORM: "et cetera"},
    {ORTH: "bzw.", LEMMA: "bezéiungsweis", NORM: "bezéiungsweis"},
-    {ORTH: "Jan.", LEMMA: "Januar", NORM: "Januar"}
+    {ORTH: "Jan.", LEMMA: "Januar", NORM: "Januar"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]

--- a/spacy/lang/nb/tag_map.py
+++ b/spacy/lang/nb/tag_map.py
@ -467,38 +467,110 @@ TAG_MAP = {
    "VERB__VerbForm=Part": {"morph": "VerbForm=Part", POS: VERB},
    "VERB___": {"morph": "_", POS: VERB},
    "X___": {"morph": "_", POS: X},
-    'CCONJ___': {"morph": "_", POS: CCONJ},
+    "CCONJ___": {"morph": "_", POS: CCONJ},
    "ADJ__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADJ},
    "ADJ__Abbr=Yes|Degree=Pos": {"morph": "Abbr=Yes|Degree=Pos", POS: ADJ},
-    "ADJ__Case=Gen|Definite=Def|Number=Sing|VerbForm=Part": {"morph": "Case=Gen|Definite=Def|Number=Sing|VerbForm=Part", POS: ADJ},
-    "ADJ__Definite=Def|Number=Sing|VerbForm=Part": {"morph": "Definite=Def|Number=Sing|VerbForm=Part", POS: ADJ},
-    "ADJ__Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part", POS: ADJ},
-    "ADJ__Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part", POS: ADJ},
-    "ADJ__Definite=Ind|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Number=Sing|VerbForm=Part", POS: ADJ},
+    "ADJ__Case=Gen|Definite=Def|Number=Sing|VerbForm=Part": {
+        "morph": "Case=Gen|Definite=Def|Number=Sing|VerbForm=Part",
+        POS: ADJ,
+    },
+    "ADJ__Definite=Def|Number=Sing|VerbForm=Part": {
+        "morph": "Definite=Def|Number=Sing|VerbForm=Part",
+        POS: ADJ,
+    },
+    "ADJ__Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part": {
+        "morph": "Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part",
+        POS: ADJ,
+    },
+    "ADJ__Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part": {
+        "morph": "Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part",
+        POS: ADJ,
+    },
+    "ADJ__Definite=Ind|Number=Sing|VerbForm=Part": {
+        "morph": "Definite=Ind|Number=Sing|VerbForm=Part",
+        POS: ADJ,
+    },
    "ADJ__Number=Sing|VerbForm=Part": {"morph": "Number=Sing|VerbForm=Part", POS: ADJ},
    "ADJ__VerbForm=Part": {"morph": "VerbForm=Part", POS: ADJ},
    "ADP__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADP},
    "ADV__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADV},
-    "DET__Case=Gen|Gender=Masc|Number=Sing|PronType=Art": {"morph": "Case=Gen|Gender=Masc|Number=Sing|PronType=Art", POS: DET},
-    "DET__Case=Gen|Number=Plur|PronType=Tot": {"morph": "Case=Gen|Number=Plur|PronType=Tot", POS: DET},
+    "DET__Case=Gen|Gender=Masc|Number=Sing|PronType=Art": {
+        "morph": "Case=Gen|Gender=Masc|Number=Sing|PronType=Art",
+        POS: DET,
+    },
+    "DET__Case=Gen|Number=Plur|PronType=Tot": {
+        "morph": "Case=Gen|Number=Plur|PronType=Tot",
+        POS: DET,
+    },
    "DET__Definite=Def|PronType=Prs": {"morph": "Definite=Def|PronType=Prs", POS: DET},
-    "DET__Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs": {"morph": "Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs", POS: DET},
-    "DET__Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs": {"morph": "Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs", POS: DET},
-    "DET__Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs": {"morph": "Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs", POS: DET},
-    "DET__Gender=Fem|Number=Sing|PronType=Art": {"morph": "Gender=Fem|Number=Sing|PronType=Art", POS: DET},
-    "DET__Gender=Fem|Number=Sing|PronType=Ind": {"morph": "Gender=Fem|Number=Sing|PronType=Ind", POS: DET},
-    "DET__Gender=Fem|Number=Sing|PronType=Prs": {"morph": "Gender=Fem|Number=Sing|PronType=Prs", POS: DET},
-    "DET__Gender=Fem|Number=Sing|PronType=Tot": {"morph": "Gender=Fem|Number=Sing|PronType=Tot", POS: DET},
-    "DET__Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg": {"morph": "Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg", POS: DET},
-    "DET__Gender=Masc|Number=Sing|PronType=Art": {"morph": "Gender=Masc|Number=Sing|PronType=Art", POS: DET},
-    "DET__Gender=Masc|Number=Sing|PronType=Ind": {"morph": "Gender=Masc|Number=Sing|PronType=Ind", POS: DET},
-    "DET__Gender=Masc|Number=Sing|PronType=Tot": {"morph": "Gender=Masc|Number=Sing|PronType=Tot", POS: DET},
-    "DET__Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg": {"morph": "Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg", POS: DET},
-    "DET__Gender=Neut|Number=Sing|PronType=Art": {"morph": "Gender=Neut|Number=Sing|PronType=Art", POS: DET},
-    "DET__Gender=Neut|Number=Sing|PronType=Dem,Ind": {"morph": "Gender=Neut|Number=Sing|PronType=Dem,Ind", POS: DET},
-    "DET__Gender=Neut|Number=Sing|PronType=Ind": {"morph": "Gender=Neut|Number=Sing|PronType=Ind", POS: DET},
-    "DET__Gender=Neut|Number=Sing|PronType=Tot": {"morph": "Gender=Neut|Number=Sing|PronType=Tot", POS: DET},
-    "DET__Number=Plur|Polarity=Neg|PronType=Neg": {"morph": "Number=Plur|Polarity=Neg|PronType=Neg", POS: DET},
+    "DET__Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs": {
+        "morph": "Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs",
+        POS: DET,
+    },
+    "DET__Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs": {
+        "morph": "Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs",
+        POS: DET,
+    },
+    "DET__Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs": {
+        "morph": "Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs",
+        POS: DET,
+    },
+    "DET__Gender=Fem|Number=Sing|PronType=Art": {
+        "morph": "Gender=Fem|Number=Sing|PronType=Art",
+        POS: DET,
+    },
+    "DET__Gender=Fem|Number=Sing|PronType=Ind": {
+        "morph": "Gender=Fem|Number=Sing|PronType=Ind",
+        POS: DET,
+    },
+    "DET__Gender=Fem|Number=Sing|PronType=Prs": {
+        "morph": "Gender=Fem|Number=Sing|PronType=Prs",
+        POS: DET,
+    },
+    "DET__Gender=Fem|Number=Sing|PronType=Tot": {
+        "morph": "Gender=Fem|Number=Sing|PronType=Tot",
+        POS: DET,
+    },
+    "DET__Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg": {
+        "morph": "Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg",
+        POS: DET,
+    },
+    "DET__Gender=Masc|Number=Sing|PronType=Art": {
+        "morph": "Gender=Masc|Number=Sing|PronType=Art",
+        POS: DET,
+    },
+    "DET__Gender=Masc|Number=Sing|PronType=Ind": {
+        "morph": "Gender=Masc|Number=Sing|PronType=Ind",
+        POS: DET,
+    },
+    "DET__Gender=Masc|Number=Sing|PronType=Tot": {
+        "morph": "Gender=Masc|Number=Sing|PronType=Tot",
+        POS: DET,
+    },
+    "DET__Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg": {
+        "morph": "Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg",
+        POS: DET,
+    },
+    "DET__Gender=Neut|Number=Sing|PronType=Art": {
+        "morph": "Gender=Neut|Number=Sing|PronType=Art",
+        POS: DET,
+    },
+    "DET__Gender=Neut|Number=Sing|PronType=Dem,Ind": {
+        "morph": "Gender=Neut|Number=Sing|PronType=Dem,Ind",
+        POS: DET,
+    },
+    "DET__Gender=Neut|Number=Sing|PronType=Ind": {
+        "morph": "Gender=Neut|Number=Sing|PronType=Ind",
+        POS: DET,
+    },
+    "DET__Gender=Neut|Number=Sing|PronType=Tot": {
+        "morph": "Gender=Neut|Number=Sing|PronType=Tot",
+        POS: DET,
+    },
+    "DET__Number=Plur|Polarity=Neg|PronType=Neg": {
+        "morph": "Number=Plur|Polarity=Neg|PronType=Neg",
+        POS: DET,
+    },
    "DET__Number=Plur|PronType=Art": {"morph": "Number=Plur|PronType=Art", POS: DET},
    "DET__Number=Plur|PronType=Ind": {"morph": "Number=Plur|PronType=Ind", POS: DET},
    "DET__Number=Plur|PronType=Prs": {"morph": "Number=Plur|PronType=Prs", POS: DET},
@ -507,57 +579,183 @@ TAG_MAP = {
    "DET__PronType=Prs": {"morph": "PronType=Prs", POS: DET},
    "NOUN__Abbr=Yes": {"morph": "Abbr=Yes", POS: NOUN},
    "NOUN__Abbr=Yes|Case=Gen": {"morph": "Abbr=Yes|Case=Gen", POS: NOUN},
-    "NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing": {"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing", POS: NOUN},
-    "NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing": {"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing", POS: NOUN},
-    "NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing": {"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing", POS: NOUN},
+    "NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing": {
+        "morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing",
+        POS: NOUN,
+    },
+    "NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing": {
+        "morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing",
+        POS: NOUN,
+    },
+    "NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing": {
+        "morph": "Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing",
+        POS: NOUN,
+    },
    "NOUN__Abbr=Yes|Gender=Masc": {"morph": "Abbr=Yes|Gender=Masc", POS: NOUN},
-    "NUM__Case=Gen|Number=Plur|NumType=Card": {"morph": "Case=Gen|Number=Plur|NumType=Card", POS: NUM},
-    "NUM__Definite=Def|Number=Sing|NumType=Card": {"morph": "Definite=Def|Number=Sing|NumType=Card", POS: NUM},
+    "NUM__Case=Gen|Number=Plur|NumType=Card": {
+        "morph": "Case=Gen|Number=Plur|NumType=Card",
+        POS: NUM,
+    },
+    "NUM__Definite=Def|Number=Sing|NumType=Card": {
+        "morph": "Definite=Def|Number=Sing|NumType=Card",
+        POS: NUM,
+    },
    "NUM__Definite=Def|NumType=Card": {"morph": "Definite=Def|NumType=Card", POS: NUM},
-    "NUM__Gender=Fem|Number=Sing|NumType=Card": {"morph": "Gender=Fem|Number=Sing|NumType=Card", POS: NUM},
-    "NUM__Gender=Masc|Number=Sing|NumType=Card": {"morph": "Gender=Masc|Number=Sing|NumType=Card", POS: NUM},
-    "NUM__Gender=Neut|Number=Sing|NumType=Card": {"morph": "Gender=Neut|Number=Sing|NumType=Card", POS: NUM},
+    "NUM__Gender=Fem|Number=Sing|NumType=Card": {
+        "morph": "Gender=Fem|Number=Sing|NumType=Card",
+        POS: NUM,
+    },
+    "NUM__Gender=Masc|Number=Sing|NumType=Card": {
+        "morph": "Gender=Masc|Number=Sing|NumType=Card",
+        POS: NUM,
+    },
+    "NUM__Gender=Neut|Number=Sing|NumType=Card": {
+        "morph": "Gender=Neut|Number=Sing|NumType=Card",
+        POS: NUM,
+    },
    "NUM__Number=Plur|NumType=Card": {"morph": "Number=Plur|NumType=Card", POS: NUM},
    "NUM__Number=Sing|NumType=Card": {"morph": "Number=Sing|NumType=Card", POS: NUM},
    "NUM__NumType=Card": {"morph": "NumType=Card", POS: NUM},
    "PART__Polarity=Neg": {"morph": "Polarity=Neg", POS: PART},
-    "PRON__Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs": {"morph": "Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs": {"morph": "Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs", POS: PRON},
-    "PRON__Animacy=Hum|Number=Plur|PronType=Rcp": {"morph": "Animacy=Hum|Number=Plur|PronType=Rcp", POS: PRON},
-    "PRON__Animacy=Hum|Number=Sing|PronType=Art,Prs": {"morph": "Animacy=Hum|Number=Sing|PronType=Art,Prs", POS: PRON},
-    "PRON__Animacy=Hum|Poss=Yes|PronType=Int": {"morph": "Animacy=Hum|Poss=Yes|PronType=Int", POS: PRON},
+    "PRON__Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs": {
+        "morph": "Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs": {
+        "morph": "Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs": {
+        "morph": "Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Number=Plur|PronType=Rcp": {
+        "morph": "Animacy=Hum|Number=Plur|PronType=Rcp",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Number=Sing|PronType=Art,Prs": {
+        "morph": "Animacy=Hum|Number=Sing|PronType=Art,Prs",
+        POS: PRON,
+    },
+    "PRON__Animacy=Hum|Poss=Yes|PronType=Int": {
+        "morph": "Animacy=Hum|Poss=Yes|PronType=Int",
+        POS: PRON,
+    },
    "PRON__Animacy=Hum|PronType=Int": {"morph": "Animacy=Hum|PronType=Int", POS: PRON},
-    "PRON__Case=Acc|PronType=Prs|Reflex=Yes": {"morph": "Case=Acc|PronType=Prs|Reflex=Yes", POS: PRON},
-    "PRON__Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs": { "morph": "Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs", POS: PRON},
-    "PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs": {"morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs", POS: PRON},
-    "PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot": {"morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot", POS: PRON},
-    "PRON__Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs": {"morph": "Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs", POS: PRON},
-    "PRON__Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs": {"morph": "Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs", POS: PRON},
-    "PRON__Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs": {"morph": "Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs", POS: PRON},
-    "PRON__Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs": {"morph": "Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs", POS: PRON},
-    "PRON__Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs": {"morph": "Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs", POS: PRON},
-    "PRON__Number=Plur|Person=3|PronType=Ind,Prs": {"morph": "Number=Plur|Person=3|PronType=Ind,Prs", POS: PRON},
-    "PRON__Number=Plur|Person=3|PronType=Prs,Tot": {"morph": "Number=Plur|Person=3|PronType=Prs,Tot", POS: PRON},
-    "PRON__Number=Plur|Poss=Yes|PronType=Prs": {"morph": "Number=Plur|Poss=Yes|PronType=Prs", POS: PRON},
-    "PRON__Number=Plur|Poss=Yes|PronType=Rcp": {"morph": "Number=Plur|Poss=Yes|PronType=Rcp", POS: PRON},
-    "PRON__Number=Sing|Polarity=Neg|PronType=Neg": {"morph": "Number=Sing|Polarity=Neg|PronType=Neg", POS: PRON},
+    "PRON__Case=Acc|PronType=Prs|Reflex=Yes": {
+        "morph": "Case=Acc|PronType=Prs|Reflex=Yes",
+        POS: PRON,
+    },
+    "PRON__Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs": {
+        "morph": "Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs",
+        POS: PRON,
+    },
+    "PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs": {
+        "morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs",
+        POS: PRON,
+    },
+    "PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot": {
+        "morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot",
+        POS: PRON,
+    },
+    "PRON__Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs": {
+        "morph": "Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs": {
+        "morph": "Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs": {
+        "morph": "Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs",
+        POS: PRON,
+    },
+    "PRON__Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs": {
+        "morph": "Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs": {
+        "morph": "Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs",
+        POS: PRON,
+    },
+    "PRON__Number=Plur|Person=3|PronType=Ind,Prs": {
+        "morph": "Number=Plur|Person=3|PronType=Ind,Prs",
+        POS: PRON,
+    },
+    "PRON__Number=Plur|Person=3|PronType=Prs,Tot": {
+        "morph": "Number=Plur|Person=3|PronType=Prs,Tot",
+        POS: PRON,
+    },
+    "PRON__Number=Plur|Poss=Yes|PronType=Prs": {
+        "morph": "Number=Plur|Poss=Yes|PronType=Prs",
+        POS: PRON,
+    },
+    "PRON__Number=Plur|Poss=Yes|PronType=Rcp": {
+        "morph": "Number=Plur|Poss=Yes|PronType=Rcp",
+        POS: PRON,
+    },
+    "PRON__Number=Sing|Polarity=Neg|PronType=Neg": {
+        "morph": "Number=Sing|Polarity=Neg|PronType=Neg",
+        POS: PRON,
+    },
    "PRON__PronType=Prs": {"morph": "PronType=Prs", POS: PRON},
    "PRON__PronType=Rel": {"morph": "PronType=Rel", POS: PRON},
    "PROPN__Abbr=Yes": {"morph": "Abbr=Yes", POS: PROPN},
    "PROPN__Abbr=Yes|Case=Gen": {"morph": "Abbr=Yes|Case=Gen", POS: PROPN},
-    "VERB__Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin": {"morph": "Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin", POS: VERB},
-    "VERB__Definite=Ind|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Number=Sing|VerbForm=Part", POS: VERB},
+    "VERB__Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin": {
+        "morph": "Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin",
+        POS: VERB,
+    },
+    "VERB__Definite=Ind|Number=Sing|VerbForm=Part": {
+        "morph": "Definite=Ind|Number=Sing|VerbForm=Part",
+        POS: VERB,
+    },
 }
--- a/spacy/lang/yo/init.py
+++ b/spacy/lang/yo/init.py
@ -0,0 +1,24 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from .stop_words import STOP_WORDS
+from .lex_attrs import LEX_ATTRS
+from ..tokenizer_exceptions import BASE_EXCEPTIONS
+from ...language import Language
+from ...attrs import LANG
+
+
+class YorubaDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters.update(LEX_ATTRS)
+    lex_attr_getters[LANG] = lambda text: "yo"
+    stop_words = STOP_WORDS
+    tokenizer_exceptions = BASE_EXCEPTIONS
+
+
+class Yoruba(Language):
+    lang = "yo"
+    Defaults = YorubaDefaults
+
+
+__all__ = ["Yoruba"]
--- a/spacy/lang/yo/examples.py
+++ b/spacy/lang/yo/examples.py
@ -0,0 +1,26 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+"""
+Example sentences to test spaCy and its language models.
+
+>>> from spacy.lang.yo.examples import sentences
+>>> docs = nlp.pipe(sentences)
+"""
+
+# 1. https://yo.wikipedia.org/wiki/Wikipedia:%C3%80y%E1%BB%8Dk%C3%A0_p%C3%A0t%C3%A0k%C3%AC
+# 2.https://yo.wikipedia.org/wiki/Oj%C3%BAew%C3%A9_%C3%80k%E1%BB%8D%CC%81k%E1%BB%8D%CC%81
+# 3. https://www.bbc.com/yoruba
+
+sentences = [
+    "Ìjọba Tanzania fi Ajìjàgbara Ọmọ Orílẹ̀-èdèe Uganda sí àtìmọ́lé",
+    "Olúṣẹ́gun Ọbásanjọ́, tí ó jẹ́ Ààrẹ ìjọba ológun àná (láti ọdún 1976 sí 1979), tí ó sì tún ṣe  Ààrẹ ìjọba alágbádá tí ìbò gbé wọlé (ní ọdún 1999 sí 2007), kúndùn láti máa bu ẹnu àtẹ́ lu àwọn "
+    "ètò ìjọba Ààrẹ orílẹ̀-èdè Nàìjíríà tí ó jẹ tẹ̀lé e.",
+    "Akin Alabi rán ẹnu mọ́ agbárá Adárí Òsìsẹ̀, àwọn ọmọ Nàìjíríà dẹnu bò ó",
+    "Ta ló leè dúró s'ẹ́gbẹ̀ẹ́ Okunnu láì rẹ́rìín?",
+    "Dídarapọ̀ mọ́n ìpolongo",
+    "Bi a se n so, omobinrin ni oruko ni ojo kejo bee naa ni omokunrin ni oruko ni ojo kesan.",
+    "Oríṣìíríṣìí nǹkan ló le yọrí sí orúkọ tí a sọ ọmọ",
+    "Gbogbo won ni won ni oriki ti won",
+]
--- a/spacy/lang/yo/lex_attrs.py
+++ b/spacy/lang/yo/lex_attrs.py
@ -0,0 +1,115 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+import unicodedata
+
+from ...attrs import LIKE_NUM
+
+
+_num_words = [
+    "ení",
+    "oókàn",
+    "ọ̀kanlá",
+    "ẹ́ẹdọ́gbọ̀n",
+    "àádọ́fà",
+    "ẹ̀walélúɡba",
+    "egbèje",
+    "ẹgbàárin",
+    "èjì",
+    "eéjì",
+    "èjìlá",
+    "ọgbọ̀n,",
+    "ọgọ́fà",
+    "ọ̀ọ́dúrún",
+    "ẹgbẹ̀jọ",
+    "ẹ̀ẹ́dẹ́ɡbàárùn",
+    "ẹ̀ta",
+    "ẹẹ́ta",
+    "ẹ̀talá",
+    "aárùndílogójì",
+    "àádóje",
+    "irinwó",
+    "ẹgbẹ̀sàn",
+    "ẹgbàárùn",
+    "ẹ̀rin",
+    "ẹẹ́rin",
+    "ẹ̀rinlá",
+    "ogójì",
+    "ogóje",
+    "ẹ̀ẹ́dẹ́gbẹ̀ta",
+    "ẹgbàá",
+    "ẹgbàájọ",
+    "àrún",
+    "aárùn",
+    "ẹ́ẹdógún",
+    "àádọ́ta",
+    "àádọ́jọ",
+    "ẹgbẹ̀ta",
+    "ẹgboókànlá",
+    "ẹgbàawǎ",
+    "ẹ̀fà",
+    "ẹẹ́fà",
+    "ẹẹ́rìndílógún",
+    "ọgọ́ta",
+    "ọgọ́jọ",
+    "ọ̀ọ́dẹ́gbẹ̀rin",
+    "ẹgbẹ́ẹdógún",
+    "ọkẹ́marun",
+    "èje",
+    "etàdílógún",
+    "àádọ́rin",
+    "àádọ́sán",
+    "ẹgbẹ̀rin",
+    "ẹgbàajì",
+    "ẹgbẹ̀ẹgbẹ̀rún",
+    "ẹ̀jọ",
+    "ẹẹ́jọ",
+    "eéjìdílógún",
+    "ọgọ́rin",
+    "ọgọsàn",
+    "ẹ̀ẹ́dẹ́gbẹ̀rún",
+    "ẹgbẹ́ẹdọ́gbọ̀n",
+    "ọgọ́rùn ọkẹ́",
+    "ẹ̀sán",
+    "ẹẹ́sàn",
+    "oókàndílógún",
+    "àádọ́rùn",
+    "ẹ̀wadilúɡba",
+    "ẹgbẹ̀rún",
+    "ẹgbàáta",
+    "ẹ̀wá",
+    "ẹẹ́wàá",
+    "ogún",
+    "ọgọ́rùn",
+    "igba",
+    "ẹgbẹ̀fà",
+    "ẹ̀ẹ́dẹ́ɡbarin",
+]
+
+
+def strip_accents_text(text):
+    """
+    Converts the string to NFD, separates & returns only the base characters
+    :param text:
+    :return: input string without diacritic adornments on base characters
+    """
+    return "".join(
+        c for c in unicodedata.normalize("NFD", text) if unicodedata.category(c) != "Mn"
+    )
+
+
+def like_num(text):
+    text = text.replace(",", "").replace(".", "")
+    num_markers = ["dí", "dọ", "lé", "dín", "di", "din", "le", "do"]
+    if any(mark in text for mark in num_markers):
+        return True
+    text = strip_accents_text(text)
+    _num_words_stripped = [strip_accents_text(num) for num in _num_words]
+    if text.isdigit():
+        return True
+    if text in _num_words_stripped or text.lower() in _num_words_stripped:
+        return True
+    return False
+
+
+LEX_ATTRS = {LIKE_NUM: like_num}
--- a/spacy/lang/yo/stop_words.py
+++ b/spacy/lang/yo/stop_words.py
@ -0,0 +1,12 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+# stop words as whitespace-separated list.
+# Source: https://raw.githubusercontent.com/dohliam/more-stoplists/master/yo/yo.txt
+
+STOP_WORDS = set(
+    "a an b bá bí bẹ̀rẹ̀ d e f fún fẹ́ g gbogbo i inú j jù jẹ jẹ́ k kan kì kí kò "
+    "l láti lè lọ m mi mo máa mọ̀ n ni náà ní nígbà nítorí nǹkan o p padà pé "
+    "púpọ̀ pẹ̀lú r rẹ̀ s sì sí sínú t ti tí u w wà wá wọn wọ́n y yìí à àti àwọn á "
+    "è é ì í ò òun ó ù ú ń ńlá ǹ ̀ ́ ̣ ṣ ṣe ṣé ṣùgbọ́n ẹ ẹmọ́ ọ ọjọ́ ọ̀pọ̀lọpọ̀".split()
+)
--- a/spacy/pipeline/entityruler.py
+++ b/spacy/pipeline/entityruler.py
@ -295,10 +295,9 @@ class EntityRuler(object):
            deserializers_patterns = {
                "patterns": lambda p: self.add_patterns(
                    srsly.read_jsonl(p.with_suffix(".jsonl"))
-                )}
-            deserializers_cfg = {
-                "cfg": lambda p: cfg.update(srsly.read_json(p))
+                )
            }
+            deserializers_cfg = {"cfg": lambda p: cfg.update(srsly.read_json(p))}
            from_disk(path, deserializers_cfg, {})
            self.overwrite = cfg.get("overwrite", False)
            self.phrase_matcher_attr = cfg.get("phrase_matcher_attr")
--- a/spacy/tests/conftest.py
+++ b/spacy/tests/conftest.py
@ -220,6 +220,11 @@ def ur_tokenizer():
    return get_lang_class("ur").Defaults.create_tokenizer()


+@pytest.fixture(scope="session")
+def yo_tokenizer():
+    return get_lang_class("yo").Defaults.create_tokenizer()
+
+
@pytest.fixture(scope="session")
 def zh_tokenizer():
    pytest.importorskip("jieba")
--- a/spacy/tests/lang/fi/test_tokenizer.py
+++ b/spacy/tests/lang/fi/test_tokenizer.py
@ -15,7 +15,7 @@ ABBREVIATION_TESTS = [
 HYPHENATED_TESTS = [
    (
        "1700-luvulle sijoittuva taide-elokuva",
-        ["1700-luvulle", "sijoittuva", "taide-elokuva"]
+        ["1700-luvulle", "sijoittuva", "taide-elokuva"],
    )
 ]

--- a/spacy/tests/lang/lb/test_exceptions.py
+++ b/spacy/tests/lang/lb/test_exceptions.py
@ -3,16 +3,19 @@ from __future__ import unicode_literals

 import pytest

+
@pytest.mark.parametrize("text", ["z.B.", "Jan."])
 def test_lb_tokenizer_handles_abbr(lb_tokenizer, text):
    tokens = lb_tokenizer(text)
    assert len(tokens) == 1

+
@pytest.mark.parametrize("text", ["d'Saach", "d'Kanner", "d’Welt", "d’Suen"])
 def test_lb_tokenizer_splits_contractions(lb_tokenizer, text):
    tokens = lb_tokenizer(text)
    assert len(tokens) == 2

+
 def test_lb_tokenizer_handles_exc_in_text(lb_tokenizer):
    text = "Mee 't ass net evident, d'Liewen."
    tokens = lb_tokenizer(text)
@ -20,6 +23,7 @@ def test_lb_tokenizer_handles_exc_in_text(lb_tokenizer):
    assert tokens[1].text == "'t"
    assert tokens[1].lemma_ == "et"

+
@pytest.mark.parametrize("text,norm", [("dass", "datt"), ("viläicht", "vläicht")])
 def test_lb_norm_exceptions(lb_tokenizer, text, norm):
    tokens = lb_tokenizer(text)
--- a/spacy/tests/lang/lb/test_text.py
+++ b/spacy/tests/lang/lb/test_text.py
@ -16,7 +16,7 @@ def test_lb_tokenizer_handles_long_text(lb_tokenizer):
    [
        ("»Wat ass mat mir geschitt?«, huet hie geduecht.", 13),
        ("“Dëst fréi Opstoen”, denkt hien, “mécht ee ganz duercherneen. ", 15),
-        ("Am Grand-Duché ass d'Liewen schéin, mee 't gëtt ze vill Autoen.", 14)
+        ("Am Grand-Duché ass d'Liewen schéin, mee 't gëtt ze vill Autoen.", 14),
    ],
 )
 def test_lb_tokenizer_handles_examples(lb_tokenizer, text, length):
--- a/spacy/tests/lang/test_initialize.py
+++ b/spacy/tests/lang/test_initialize.py
@ -11,7 +11,7 @@ from spacy.util import get_lang_class
 LANGUAGES = ["af", "ar", "bg", "bn", "ca", "cs", "da", "de", "el", "en", "es",
             "et", "fa", "fi", "fr", "ga", "he", "hi", "hr", "hu", "id", "is",
             "it", "kn", "lt", "lv", "nb", "nl", "pl", "pt", "ro", "si", "sk",
-             "sl", "sq", "sr", "sv", "ta", "te", "tl", "tr", "tt", "ur"]
+             "sl", "sq", "sr", "sv", "ta", "te", "tl", "tr", "tt", "ur", 'yo']
 # fmt: on


--- a/spacy/tests/lang/yo/init.py
+++ b/spacy/tests/lang/yo/init.py
--- a/spacy/tests/lang/yo/test_text.py
+++ b/spacy/tests/lang/yo/test_text.py
@ -0,0 +1,32 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+import pytest
+from spacy.lang.yo.lex_attrs import like_num
+
+
+def test_yo_tokenizer_handles_long_text(yo_tokenizer):
+    text = """Àwọn ọmọ ìlú tí wọ́n ń ṣàmúlò ayélujára ti bẹ̀rẹ̀ ìkọkúkọ sórí àwòrán ààrẹ Nkurunziza nínú ìfẹ̀hónúhàn pẹ̀lú àmì ìdámọ̀: Nkurunziza àti Burundi:
+        Ọmọ ilé ẹ̀kọ́ gíga ní ẹ̀wọ̀n fún kíkọ ìkọkúkọ sí orí àwòrán Ààrẹ .
+        Bí mo bá ṣe èyí ní Burundi , ó ṣe é ṣe kí a fi mí sí àtìmọ́lé
+        Ìjọba Burundi fi akẹ́kọ̀ọ́bìnrin sí àtìmọ́lé látàrí ẹ̀sùn ìkọkúkọ sí orí àwòrán ààrẹ. A túwíìtì àwòrán ìkọkúkọ wa ní ìbánikẹ́dùn ìṣẹ̀lẹ̀ náà.
+        Wọ́n ní kí a dán an wò, kí a kọ nǹkan sí orí àwòrán ààrẹ  mo sì ṣe bẹ́ẹ̀. Mo ní ìgbóyà wípé ẹnikẹ́ni kò ní mú mi níbí.
+        Ìfòfinlíle mú àtakò"""
+    tokens = yo_tokenizer(text)
+    assert len(tokens) == 121
+
+
+@pytest.mark.parametrize(
+    "text,match",
+    [("ení", True), ("ogun", True), ("mewadinlogun", True), ("ten", False)],
+)
+def test_lex_attrs_like_number(yo_tokenizer, text, match):
+    tokens = yo_tokenizer(text)
+    assert len(tokens) == 1
+    assert tokens[0].like_num == match
+
+
+@pytest.mark.parametrize("word", ["eji", "ejila", "ogun", "aárùn"])
+def test_yo_lex_attrs_capitals(word):
+    assert like_num(word)
+    assert like_num(word.upper())
--- a/spacy/tests/parser/test_parse.py
+++ b/spacy/tests/parser/test_parse.py
@ -151,17 +151,17 @@ def test_parser_arc_eager_finalize_state(en_tokenizer, en_parser):


 def test_parser_set_sent_starts(en_vocab):
+    # fmt: off
    words = ['Ein', 'Satz', '.', 'Außerdem', 'ist', 'Zimmer', 'davon', 'überzeugt', ',', 'dass', 'auch', 'epige-', '\n', 'netische', 'Mechanismen', 'eine', 'Rolle', 'spielen', ',', 'also', 'Vorgänge', ',', 'die', '\n', 'sich', 'darauf', 'auswirken', ',', 'welche', 'Gene', 'abgelesen', 'werden', 'und', '\n', 'welche', 'nicht', '.', '\n']
    heads = [1, 0, -1, 27, 0, -1, 1, -3, -1, 8, 4, 3, -1, 1, 3, 1, 1, -11, -1, 1, -9, -1, 4, -1, 2, 1, -6, -1, 1, 2, 1, -6, -1, -1, -17, -31, -32, -1]
    deps = ['nk', 'ROOT', 'punct', 'mo', 'ROOT', 'sb', 'op', 'pd', 'punct', 'cp', 'mo', 'nk', '', 'nk', 'sb', 'nk', 'oa', 're', 'punct', 'mo', 'app', 'punct', 'sb', '', 'oa', 'op', 'rc', 'punct', 'nk', 'sb', 'oc', 're', 'cd', '', 'oa', 'ng', 'punct', '']
-    doc = get_doc(
-        en_vocab, words=words, deps=deps, heads=heads
-    )
+    # fmt: on
+    doc = get_doc(en_vocab, words=words, deps=deps, heads=heads)
    for i in range(len(words)):
        if i == 0 or i == 3:
-            assert doc[i].is_sent_start == True
+            assert doc[i].is_sent_start is True
        else:
-            assert doc[i].is_sent_start == None
+            assert doc[i].is_sent_start is None
    for sent in doc.sents:
        for token in sent:
            assert token.head in sent
--- a/spacy/tests/pipeline/test_tagger.py
+++ b/spacy/tests/pipeline/test_tagger.py
@ -3,7 +3,6 @@ from __future__ import unicode_literals

 import pytest
 from spacy.language import Language
-from spacy.pipeline import Tagger


 def test_label_types():
--- a/spacy/tests/regression/test_issue4674.py
+++ b/spacy/tests/regression/test_issue4674.py
@ -1,11 +1,12 @@
 # coding: utf-8
 from __future__ import unicode_literals

+import pytest
 from spacy.kb import KnowledgeBase
 from spacy.util import ensure_path
-
 from spacy.lang.en import English
-from spacy.tests.util import make_tempdir
+
+from ..util import make_tempdir


 def test_issue4674():
@ -15,7 +16,12 @@ def test_issue4674():

    vector1 = [0.9, 1.1, 1.01]
    vector2 = [1.8, 2.25, 2.01]
-    kb.set_entities(entity_list=["Q1", "Q1"], freq_list=[32, 111], vector_list=[vector1, vector2])
+    with pytest.warns(UserWarning):
+        kb.set_entities(
+            entity_list=["Q1", "Q1"],
+            freq_list=[32, 111],
+            vector_list=[vector1, vector2],
+        )

    assert kb.get_size_entities() == 1

@ -31,4 +37,3 @@ def test_issue4674():
        kb2.load_bulk(str(file_path))

    assert kb2.get_size_entities() == 1
-
--- a/spacy/tokens/doc.pyx
+++ b/spacy/tokens/doc.pyx
@ -994,9 +994,9 @@ cdef class Doc:
         order, and no span intersection is allowed.

        spans (Span[]): Spans to merge, in document order, with all span
-            intersections empty. Cannot be emty.
+            intersections empty. Cannot be empty.
        attributes (Dictionary[]): Attributes to assign to the merged tokens. By default,
-            must be the same lenghth as spans, emty dictionaries are allowed.
+            must be the same length as spans, empty dictionaries are allowed.
            attributes are inherited from the syntactic root of the span.
        RETURNS (Token): The first newly merged token.
        """
--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@ -77,9 +77,9 @@ more efficient than processing texts one-by-one.
 Early versions of spaCy used simple statistical models that could be efficiently
 multi-threaded, as we were able to entirely release Python's global interpreter
 lock. The multi-threading was controlled using the `n_threads` keyword argument
-to the `.pipe` method. This keyword argument is now deprecated as of v2.1.0.
-Future versions may introduce a `n_process` argument for parallel inference via
-multiprocessing.
+to the `.pipe` method. This keyword argument is now deprecated as of v2.1.0. A
+new keyword argument, `n_process`, was introduced to control parallel inference
+via multiprocessing in v2.2.2.

 </Infobox>

@ -98,6 +98,7 @@ multiprocessing.
 | `batch_size`                                 | int   | The number of texts to buffer.                                                                                                                             |
 | `disable`                                    | list  | Names of pipeline components to [disable](/usage/processing-pipelines#disabling).                                                                          |
 | `component_cfg` <Tag variant="new">2.1</Tag> | dict  | Config parameters for specific pipeline components, keyed by component name.                                                                               |
+| `n_process` <Tag variant="new">2.2.2</Tag>   | int   | Number of processors to use, only supported in Python 3. Defaults to `1`.                                                                                  |
 | **YIELDS**                                   | `Doc` | Documents in the order of the original text.                                                                                                               |

 ## Language.update {#update tag="method"}
--- a/website/docs/usage/index.md
+++ b/website/docs/usage/index.md
@ -124,9 +124,8 @@ interface for GPU arrays.
 spaCy can be installed on GPU by specifying `spacy[cuda]`, `spacy[cuda90]`,
 `spacy[cuda91]`, `spacy[cuda92]` or `spacy[cuda100]`. If you know your cuda
 version, using the more explicit specifier allows cupy to be installed via
-wheel, saving some compilation time. The specifiers should install two
-libraries: [`cupy`](https://cupy.chainer.org) and
-[`thinc_gpu_ops`](https://github.com/explosion/thinc_gpu_ops).
+wheel, saving some compilation time. The specifiers should install
+[`cupy`](https://cupy.chainer.org).

 ```bash
 $ pip install -U spacy[cuda92]