💫 Tidy up and auto-format .py files (#2983)

## Description - [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files. - [x] Update flake8 config to exclude very large files (lemmatization tables etc.) - [x] Update code to be compatible with flake8 rules - [x] Fix various small bugs, inconsistencies and messy stuff in the language data - [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means) Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results. At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information. ### Types of change enhancement, code style ## Checklist  - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2025-09-18 18:12:45 +03:00 · 2018-11-30 17:03:03 +01:00 · 2018-11-30 17:03:03 +01:00 · eddeb36c96
commit eddeb36c96
parent 852bc2ac16
268 changed files with 25626 additions and 17854 deletions
--- a/.flake8
+++ b/.flake8
@ -1,4 +1,13 @@
 [flake8]
-ignore = E203, E266, E501, W503
+ignore = E203, E266, E501, E731, W503
 max-line-length = 80
 select = B,C,E,F,W,T4,B9
 exclude =
    .env,
    .git,
    __pycache__,
    lemmatizer.py,
    lookup.py,
    _tokenizer_exceptions_list.py,
    spacy/lang/fr/lemmatizer,
    spacy/lang/nb/lemmatizer
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -186,13 +186,99 @@ sure your test passes and reference the issue in your commit message.
 ## Code conventions
 Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/).
-Regular line length is **80 characters**, with some tolerance for lines up to
+As of `v2.1.0`, spaCy uses [`black`](https://github.com/ambv/black) for code
-90 characters if the alternative would be worse — for instance, if your list
+formatting and [`flake8`](http://flake8.pycqa.org/en/latest/) for linting its
-comprehension comes to 82 characters, it's better not to split it over two lines.
+Python modules. If you've built spaCy from source, you'll already have both
-You can also use a linter like [`flake8`](https://pypi.python.org/pypi/flake8)
+tools installed.
-or [`frosted`](https://pypi.python.org/pypi/frosted) – just keep in mind that
+
-it won't work very well for `.pyx` files and will complain about Cython syntax
+**⚠️ Note that formatting and linting is currently only possible for Python
-like `<int*>` or `cimport`.
+modules in `.py` files, not Cython modules in `.pyx` and `.pxd` files.**
 ### Code formatting
 [`black`](https://github.com/ambv/black) is an opinionated Python code
 formatter, optimised to produce readable code and small diffs. You can run
 `black` from the command-line, or via your code editor. For example, if you're
 using [Visual Studio Code](https://code.visualstudio.com/), you can  add the
 following to your `settings.json` to use `black` for formatting and auto-format
 your files on save:
 ```json
 {
    "python.formatting.provider": "black",
    "[python]": {
        "editor.formatOnSave": true
    }
 }
 ```
 [See here](https://github.com/ambv/black#editor-integration) for the full
 list of available editor integrations.
 #### Disabling formatting
 There are a few cases where auto-formatting doesn't improve readability – for
 example, in some of the the language data files like the `tag_map.py`, or in
 the tests that construct `Doc` objects from lists of words and other labels.
 Wrapping a block in `# fmt: off` and `# fmt: on` lets you disable formatting
 for that particular code. Here's an example:
 ```python
 # fmt: off
 text = "I look forward to using Thingamajig.  I've been told it will make my life easier..."
 heads = [1, 0, -1, -2, -1, -1, -5, -1, 3, 2, 1, 0, 2, 1, -3, 1, 1, -3, -7]
 deps = ["nsubj", "ROOT", "advmod", "prep", "pcomp", "dobj", "punct", "",
        "nsubjpass", "aux", "auxpass", "ROOT", "nsubj", "aux", "ccomp",
        "poss", "nsubj", "ccomp", "punct"]
 # fmt: on
 ```
 ### Code linting
 [`flake8`](http://flake8.pycqa.org/en/latest/) is a tool for enforcing code
 style. It scans one or more files and outputs errors and warnings. This feedback
 can help you stick to general standards and conventions, and can be very useful
 for spotting potential mistakes and inconsistencies in your code. The most
 important things to watch out for are syntax errors and undefined names, but you
 also want to keep an eye on unused declared variables or repeated
 (i.e. overwritten) dictionary keys. If your code was formatted with `black`
 (see above), you shouldn't see any formatting-related warnings.
 The [`.flake8`](.flake8) config defines the configuration we use for this
 codebase. For example, we're not super strict about the line length, and we're
 excluding very large files like lemmatization and tokenizer exception tables.
 Ideally, running the following command from within the repo directory should
 not return any errors or warnings:
 ```bash
 flake8 spacy
 ```
 #### Disabling linting
 Sometimes, you explicitly want to write code that's not compatible with our
 rules. For example, a module's `__init__.py` might import a function so other
 modules can import it from there, but `flake8` will complain about an unused
 import. And although it's generally discouraged, there might be cases where it
 makes sense to use a bare `except`.
 To ignore a given line, you can add a comment like `# noqa: F401`, specifying
 the code of the error or warning we want to ignore. It's also possible to
 ignore several comma-separated codes at once, e.g. `# noqa: E731,E123`. Here
 are some examples:
 ```python
 # The imported class isn't used in this file, but imported here, so it can be
 # imported *from* here by another module.
 from .submodule import SomeClass  # noqa: F401
 try:
    do_something()
 except:  # noqa: E722
    # This bare except is justified, for some specific reason
    do_something_else()
 ```
 ### Python conventions
--- a/bin/cythonize.py
+++ b/bin/cythonize.py
@ -35,41 +35,49 @@ import subprocess
 import argparse
-HASH_FILE = 'cythonize.json'
+HASH_FILE = "cythonize.json"
-def process_pyx(fromfile, tofile, language_level='-2'):
+def process_pyx(fromfile, tofile, language_level="-2"):
-    print('Processing %s' % fromfile)
+    print("Processing %s" % fromfile)
    try:
        from Cython.Compiler.Version import version as cython_version
        from distutils.version import LooseVersion
-        if LooseVersion(cython_version) < LooseVersion('0.19'):
+
-            raise Exception('Require Cython >= 0.19')
+        if LooseVersion(cython_version) < LooseVersion("0.19"):
            raise Exception("Require Cython >= 0.19")
    except ImportError:
        pass
-    flags = ['--fast-fail', language_level]
+    flags = ["--fast-fail", language_level]
-    if tofile.endswith('.cpp'):
+    if tofile.endswith(".cpp"):
-        flags += ['--cplus']
+        flags += ["--cplus"]
    try:
        try:
-            r = subprocess.call(['cython'] + flags + ['-o', tofile, fromfile],
+            r = subprocess.call(
-                                env=os.environ) # See Issue #791
+                ["cython"] + flags + ["-o", tofile, fromfile], env=os.environ
            )  # See Issue #791
            if r != 0:
-                raise Exception('Cython failed')
+                raise Exception("Cython failed")
        except OSError:
            # There are ways of installing Cython that don't result in a cython
            # executable on the path, see gh-2397.
-            r = subprocess.call([sys.executable, '-c',
+            r = subprocess.call(
-                                'import sys; from Cython.Compiler.Main import '
+                [
-                                'setuptools_main as main; sys.exit(main())'] + flags +
+                    sys.executable,
-                                ['-o', tofile, fromfile])
+                    "-c",
                    "import sys; from Cython.Compiler.Main import "
                    "setuptools_main as main; sys.exit(main())",
                ]
                + flags
                + ["-o", tofile, fromfile]
            )
            if r != 0:
-                raise Exception('Cython failed')
+                raise Exception("Cython failed")
    except OSError:
-        raise OSError('Cython needs to be installed')
+        raise OSError("Cython needs to be installed")
 def preserve_cwd(path, func, *args):
@ -89,12 +97,12 @@ def load_hashes(filename):
 def save_hashes(hash_db, filename):
-    with open(filename, 'w') as f:
+    with open(filename, "w") as f:
        f.write(json.dumps(hash_db))
 def get_hash(path):
-    return hashlib.md5(open(path, 'rb').read()).hexdigest()
+    return hashlib.md5(open(path, "rb").read()).hexdigest()
 def hash_changed(base, path, db):
@ -109,25 +117,27 @@ def hash_add(base, path, db):
 def process(base, filename, db):
    root, ext = os.path.splitext(filename)
-    if ext in ['.pyx', '.cpp']:
+    if ext in [".pyx", ".cpp"]:
-        if hash_changed(base, filename, db) or not os.path.isfile(os.path.join(base, root + '.cpp')):
+        if hash_changed(base, filename, db) or not os.path.isfile(
-            preserve_cwd(base, process_pyx, root + '.pyx', root + '.cpp')
+            os.path.join(base, root + ".cpp")
-            hash_add(base, root + '.cpp', db)
+        ):
-            hash_add(base, root + '.pyx', db)
+            preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
            hash_add(base, root + ".cpp", db)
            hash_add(base, root + ".pyx", db)
 def check_changes(root, db):
    res = False
    new_db = {}
-    setup_filename = 'setup.py'
+    setup_filename = "setup.py"
-    hash_add('.', setup_filename, new_db)
+    hash_add(".", setup_filename, new_db)
-    if hash_changed('.', setup_filename, db):
+    if hash_changed(".", setup_filename, db):
        res = True
    for base, _, files in os.walk(root):
        for filename in files:
-            if filename.endswith('.pxd'):
+            if filename.endswith(".pxd"):
                hash_add(base, filename, new_db)
                if hash_changed(base, filename, db):
                    res = True
@ -150,8 +160,10 @@ def run(root):
        save_hashes(db, HASH_FILE)
-if __name__ == '__main__':
+if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description='Cythonize pyx files into C++ files as needed')
+    parser = argparse.ArgumentParser(
-    parser.add_argument('root', help='root directory')
+        description="Cythonize pyx files into C++ files as needed"
    )
    parser.add_argument("root", help="root directory")
    args = parser.parse_args()
    run(args.root)
--- a/bin/load_reddit.py
+++ b/bin/load_reddit.py
@ -15,12 +15,13 @@ _unset = object()
 class Reddit(object):
    """Stream cleaned comments from Reddit."""
    pre_format_re = re.compile(r'^[\`\*\~]')
    post_format_re = re.compile(r'[\`\*\~]$')
    url_re = re.compile(r'\[([^]]+)\]\(%%URL\)')
    link_re = re.compile(r'\[([^]]+)\]\(https?://[^\)]+\)')
-    def __init__(self, file_path, meta_keys={'subreddit': 'section'}):
+    pre_format_re = re.compile(r"^[\`\*\~]")
    post_format_re = re.compile(r"[\`\*\~]$")
    url_re = re.compile(r"\[([^]]+)\]\(%%URL\)")
    link_re = re.compile(r"\[([^]]+)\]\(https?://[^\)]+\)")
    def __init__(self, file_path, meta_keys={"subreddit": "section"}):
        """
        file_path (unicode / Path): Path to archive or directory of archives.
        meta_keys (dict): Meta data key included in the Reddit corpus, mapped
@ -45,28 +46,30 @@ class Reddit(object):
                        continue
                    comment = ujson.loads(line)
                    if self.is_valid(comment):
-                        text = self.strip_tags(comment['body'])
+                        text = self.strip_tags(comment["body"])
-                        yield {'text': text}
+                        yield {"text": text}
    def get_meta(self, item):
-        return {name: item.get(key, 'n/a') for key, name in self.meta.items()}
+        return {name: item.get(key, "n/a") for key, name in self.meta.items()}
    def iter_files(self):
        for file_path in self.files:
            yield file_path
    def strip_tags(self, text):
-        text = self.link_re.sub(r'\1', text)
+        text = self.link_re.sub(r"\1", text)
-        text = text.replace('&gt;', '>').replace('&lt;', '<')
+        text = text.replace("&gt;", ">").replace("&lt;", "<")
-        text = self.pre_format_re.sub('', text)
+        text = self.pre_format_re.sub("", text)
-        text = self.post_format_re.sub('', text)
+        text = self.post_format_re.sub("", text)
-        text = re.sub(r'\s+', ' ', text)
+        text = re.sub(r"\s+", " ", text)
        return text.strip()
    def is_valid(self, comment):
-        return comment['body'] is not None \
+        return (
-            and comment['body'] != '[deleted]' \
+            comment["body"] is not None
-            and comment['body'] != '[removed]'
+            and comment["body"] != "[deleted]"
            and comment["body"] != "[removed]"
        )
 def main(path):
@ -75,8 +78,9 @@ def main(path):
        print(ujson.dumps(comment))
-if __name__ == '__main__':
+if __name__ == "__main__":
    import socket
    try:
        BrokenPipeError
    except NameError:
@ -85,6 +89,7 @@ if __name__ == '__main__':
        plac.call(main)
    except BrokenPipeError:
        import os, sys
        # Python flushes standard streams on exit; redirect remaining output
        # to devnull to avoid another BrokenPipeError at shutdown
        devnull = os.open(os.devnull, os.O_WRONLY)
--- a/requirements.txt
+++ b/requirements.txt
@ -11,7 +11,10 @@ ujson>=1.35
 dill>=0.2,<0.3
 regex==2018.01.10
 requests>=2.13.0,<3.0.0
 pathlib==1.0.1; python_version < "3.4"
 # Development dependencies
 pytest>=4.0.0,<5.0.0
 pytest-timeout>=1.3.0,<2.0.0
 mock>=2.0.0,<3.0.0
-pathlib==1.0.1; python_version < "3.4"
+black==18.9b0
 flake8>=3.5.0,<3.6.0
--- a/spacy/_ml.py
+++ b/spacy/_ml.py
@ -14,8 +14,7 @@ from thinc.api import uniqued, wrap, noop
 from thinc.api import with_square_sequences
 from thinc.linear.linear import LinearModel
 from thinc.neural.ops import NumpyOps, CupyOps
-from thinc.neural.util import get_array_module, copy_array
+from thinc.neural.util import get_array_module
 from thinc.neural._lsuv import svd_orthonormal
 from thinc.neural.optimizers import Adam
 from thinc import describe
@ -33,36 +32,36 @@ try:
 except:
    torch = None
-VECTORS_KEY = 'spacy_pretrained_vectors'
+VECTORS_KEY = "spacy_pretrained_vectors"
 def cosine(vec1, vec2):
    xp = get_array_module(vec1)
    norm1 = xp.linalg.norm(vec1)
    norm2 = xp.linalg.norm(vec2)
-    if norm1 == 0. or norm2 == 0.:
+    if norm1 == 0.0 or norm2 == 0.0:
        return 0
    else:
        return vec1.dot(vec2) / (norm1 * norm2)
 def create_default_optimizer(ops, **cfg):
-    learn_rate = util.env_opt('learn_rate', 0.001)
+    learn_rate = util.env_opt("learn_rate", 0.001)
-    beta1 = util.env_opt('optimizer_B1', 0.8)
+    beta1 = util.env_opt("optimizer_B1", 0.8)
-    beta2 = util.env_opt('optimizer_B2', 0.8)
+    beta2 = util.env_opt("optimizer_B2", 0.8)
-    eps = util.env_opt('optimizer_eps', 0.00001)
+    eps = util.env_opt("optimizer_eps", 0.00001)
-    L2 = util.env_opt('L2_penalty', 1e-6)
+    L2 = util.env_opt("L2_penalty", 1e-6)
-    max_grad_norm = util.env_opt('grad_norm_clip', 5.)
+    max_grad_norm = util.env_opt("grad_norm_clip", 5.0)
-    optimizer = Adam(ops, learn_rate, L2=L2, beta1=beta1,
+    optimizer = Adam(ops, learn_rate, L2=L2, beta1=beta1, beta2=beta2, eps=eps)
                     beta2=beta2, eps=eps)
    optimizer.max_grad_norm = max_grad_norm
    optimizer.device = ops.device
    return optimizer
@layerize
-def _flatten_add_lengths(seqs, pad=0, drop=0.):
+def _flatten_add_lengths(seqs, pad=0, drop=0.0):
    ops = Model.ops
-    lengths = ops.asarray([len(seq) for seq in seqs], dtype='i')
+    lengths = ops.asarray([len(seq) for seq in seqs], dtype="i")
    def finish_update(d_X, sgd=None):
        return ops.unflatten(d_X, lengths, pad=pad)
@ -74,14 +73,15 @@ def _flatten_add_lengths(seqs, pad=0, drop=0.):
 def _zero_init(model):
    def _zero_init_impl(self, X, y):
        self.W.fill(0)
    model.on_data_hooks.append(_zero_init_impl)
    if model.W is not None:
-        model.W.fill(0.)
+        model.W.fill(0.0)
    return model
@layerize
-def _preprocess_doc(docs, drop=0.):
+def _preprocess_doc(docs, drop=0.0):
    keys = [doc.to_array(LOWER) for doc in docs]
    ops = Model.ops
    # The dtype here matches what thinc is expecting -- which differs per
@ -89,11 +89,12 @@ def _preprocess_doc(docs, drop=0.):
    # is fixed on Thinc's side.
    lengths = ops.asarray([arr.shape[0] for arr in keys], dtype=numpy.int_)
    keys = ops.xp.concatenate(keys)
-    vals = ops.allocate(keys.shape) + 1.
+    vals = ops.allocate(keys.shape) + 1.0
    return (keys, vals, lengths), None
@layerize
-def _preprocess_doc_bigrams(docs, drop=0.):
+def _preprocess_doc_bigrams(docs, drop=0.0):
    unigrams = [doc.to_array(LOWER) for doc in docs]
    ops = Model.ops
    bigrams = [ops.ngrams(2, doc_unis) for doc_unis in unigrams]
@ -104,27 +105,29 @@ def _preprocess_doc_bigrams(docs, drop=0.):
    # is fixed on Thinc's side.
    lengths = ops.asarray([arr.shape[0] for arr in keys], dtype=numpy.int_)
    keys = ops.xp.concatenate(keys)
-    vals = ops.asarray(ops.xp.concatenate(vals), dtype='f')
+    vals = ops.asarray(ops.xp.concatenate(vals), dtype="f")
    return (keys, vals, lengths), None
-@describe.on_data(_set_dimensions_if_needed,
+@describe.on_data(
-    lambda model, X, y: model.init_weights(model))
+    _set_dimensions_if_needed, lambda model, X, y: model.init_weights(model)
 )
@describe.attributes(
    nI=Dimension("Input size"),
    nF=Dimension("Number of features"),
    nO=Dimension("Output size"),
    nP=Dimension("Maxout pieces"),
-    W=Synapses("Weights matrix",
+    W=Synapses("Weights matrix", lambda obj: (obj.nF, obj.nO, obj.nP, obj.nI)),
-        lambda obj: (obj.nF, obj.nO, obj.nP, obj.nI)),
+    b=Biases("Bias vector", lambda obj: (obj.nO, obj.nP)),
-    b=Biases("Bias vector",
+    pad=Synapses(
-        lambda obj: (obj.nO, obj.nP)),
+        "Pad",
    pad=Synapses("Pad",
        lambda obj: (1, obj.nF, obj.nO, obj.nP),
-        lambda M, ops: ops.normal_init(M, 1.)),
+        lambda M, ops: ops.normal_init(M, 1.0),
    ),
    d_W=Gradient("W"),
    d_pad=Gradient("pad"),
-    d_b=Gradient("b"))
+    d_b=Gradient("b"),
 )
 class PrecomputableAffine(Model):
    def __init__(self, nO=None, nI=None, nF=None, nP=None, **kwargs):
        Model.__init__(self, **kwargs)
@ -133,9 +136,10 @@ class PrecomputableAffine(Model):
        self.nI = nI
        self.nF = nF
-    def begin_update(self, X, drop=0.):
+    def begin_update(self, X, drop=0.0):
-        Yf = self.ops.gemm(X,
+        Yf = self.ops.gemm(
-            self.W.reshape((self.nF*self.nO*self.nP, self.nI)), trans2=True)
+            X, self.W.reshape((self.nF * self.nO * self.nP, self.nI)), trans2=True
        )
        Yf = Yf.reshape((Yf.shape[0], self.nF, self.nO, self.nP))
        Yf = self._add_padding(Yf)
@ -154,7 +158,8 @@ class PrecomputableAffine(Model):
            dXf = self.ops.gemm(dY.reshape((dY.shape[0], self.nO * self.nP)), Wopfi)
            # Reuse the buffer
-            dWopfi = Wopfi; dWopfi.fill(0.)
+            dWopfi = Wopfi
            dWopfi.fill(0.0)
            self.ops.gemm(dY, Xf, out=dWopfi, trans1=True)
            dWopfi = dWopfi.reshape((self.nO, self.nP, self.nF, self.nI))
            # (o, p, f, i) --> (f, o, p, i)
@ -163,6 +168,7 @@ class PrecomputableAffine(Model):
            if sgd is not None:
                sgd(self._mem.weights, self._mem.gradient, key=self.id)
            return dXf.reshape((dXf.shape[0], self.nF, self.nI))
        return Yf, backward
    def _add_padding(self, Yf):
@ -171,7 +177,7 @@ class PrecomputableAffine(Model):
    def _backprop_padding(self, dY, ids):
        # (1, nF, nO, nP) += (nN, nF, nO, nP) where IDs (nN, nF) < 0
-        mask = ids < 0.
+        mask = ids < 0.0
        mask = mask.sum(axis=1)
        d_pad = dY * mask.reshape((ids.shape[0], 1, 1))
        self.d_pad += d_pad.sum(axis=0)
@ -179,33 +185,36 @@ class PrecomputableAffine(Model):
    @staticmethod
    def init_weights(model):
-        '''This is like the 'layer sequential unit variance', but instead
+        """This is like the 'layer sequential unit variance', but instead
        of taking the actual inputs, we randomly generate whitened data.
        Why's this all so complicated? We have a huge number of inputs,
        and the maxout unit makes guessing the dynamics tricky. Instead
        we set the maxout weights to values that empirically result in
        whitened outputs given whitened inputs.
-        '''
+        """
-        if (model.W**2).sum() != 0.:
+        if (model.W ** 2).sum() != 0.0:
            return
        ops = model.ops
        xp = ops.xp
        ops.normal_init(model.W, model.nF * model.nI, inplace=True)
-        ids = ops.allocate((5000, model.nF), dtype='f')
+        ids = ops.allocate((5000, model.nF), dtype="f")
        ids += xp.random.uniform(0, 1000, ids.shape)
-        ids = ops.asarray(ids, dtype='i')
+        ids = ops.asarray(ids, dtype="i")
-        tokvecs = ops.allocate((5000, model.nI), dtype='f')
+        tokvecs = ops.allocate((5000, model.nI), dtype="f")
-        tokvecs += xp.random.normal(loc=0., scale=1.,
+        tokvecs += xp.random.normal(loc=0.0, scale=1.0, size=tokvecs.size).reshape(
-                    size=tokvecs.size).reshape(tokvecs.shape)
+            tokvecs.shape
        )
        def predict(ids, tokvecs):
            # nS ids. nW tokvecs. Exclude the padding array.
            hiddens = model(tokvecs[:-1])  # (nW, f, o, p)
-            vectors = model.ops.allocate((ids.shape[0], model.nO * model.nP), dtype='f')
+            vectors = model.ops.allocate((ids.shape[0], model.nO * model.nP), dtype="f")
            # need nS vectors
-            hiddens = hiddens.reshape((hiddens.shape[0] * model.nF, model.nO * model.nP))
+            hiddens = hiddens.reshape(
                (hiddens.shape[0] * model.nF, model.nO * model.nP)
            )
            model.ops.scatter_add(vectors, ids.flatten(), hiddens)
            vectors = vectors.reshape((vectors.shape[0], model.nO, model.nP))
            vectors += model.b
@ -238,7 +247,8 @@ def link_vectors_to_models(vocab):
        if vectors.data.size != 0:
            print(
                "Warning: Unnamed vectors -- this won't allow multiple vectors "
-                "models to be loaded. (Shape: (%d, %d))" % vectors.data.shape)
+                "models to be loaded. (Shape: (%d, %d))" % vectors.data.shape
            )
    ops = Model.ops
    for word in vocab:
        if word.orth in vectors.key2row:
@ -259,23 +269,26 @@ def PyTorchBiLSTM(nO, nI, depth, dropout=0.2):
 def Tok2Vec(width, embed_size, **kwargs):
-    pretrained_vectors = kwargs.get('pretrained_vectors', None)
+    pretrained_vectors = kwargs.get("pretrained_vectors", None)
-    cnn_maxout_pieces = kwargs.get('cnn_maxout_pieces', 2)
+    cnn_maxout_pieces = kwargs.get("cnn_maxout_pieces", 2)
-    subword_features = kwargs.get('subword_features', True)
+    subword_features = kwargs.get("subword_features", True)
-    conv_depth = kwargs.get('conv_depth', 4)
+    conv_depth = kwargs.get("conv_depth", 4)
-    bilstm_depth = kwargs.get('bilstm_depth', 0)
+    bilstm_depth = kwargs.get("bilstm_depth", 0)
    cols = [ID, NORM, PREFIX, SUFFIX, SHAPE, ORTH]
-    with Model.define_operators({'>>': chain, '|': concatenate, '**': clone,
+    with Model.define_operators(
-                                 '+': add, '*': reapply}):
+        {">>": chain, "|": concatenate, "**": clone, "+": add, "*": reapply}
-        norm = HashEmbed(width, embed_size, column=cols.index(NORM),
+    ):
-                         name='embed_norm')
+        norm = HashEmbed(width, embed_size, column=cols.index(NORM), name="embed_norm")
        if subword_features:
-            prefix = HashEmbed(width, embed_size//2, column=cols.index(PREFIX),
+            prefix = HashEmbed(
-                               name='embed_prefix')
+                width, embed_size // 2, column=cols.index(PREFIX), name="embed_prefix"
-            suffix = HashEmbed(width, embed_size//2, column=cols.index(SUFFIX),
+            )
-                               name='embed_suffix')
+            suffix = HashEmbed(
-            shape = HashEmbed(width, embed_size//2, column=cols.index(SHAPE),
+                width, embed_size // 2, column=cols.index(SUFFIX), name="embed_suffix"
-                              name='embed_shape')
+            )
            shape = HashEmbed(
                width, embed_size // 2, column=cols.index(SHAPE), name="embed_shape"
            )
        else:
            prefix, suffix, shape = (None, None, None)
        if pretrained_vectors is not None:
@ -284,15 +297,20 @@ def Tok2Vec(width, embed_size, **kwargs):
            if subword_features:
                embed = uniqued(
                    (glove | norm | prefix | suffix | shape)
-                    >> LN(Maxout(width, width*5, pieces=3)), column=cols.index(ORTH))
+                    >> LN(Maxout(width, width * 5, pieces=3)),
                    column=cols.index(ORTH),
                )
            else:
                embed = uniqued(
-                    (glove | norm)
+                    (glove | norm) >> LN(Maxout(width, width * 2, pieces=3)),
-                    >> LN(Maxout(width, width*2, pieces=3)), column=cols.index(ORTH))
+                    column=cols.index(ORTH),
                )
        elif subword_features:
            embed = uniqued(
                (norm | prefix | suffix | shape)
-                >> LN(Maxout(width, width*4, pieces=3)), column=cols.index(ORTH))
+                >> LN(Maxout(width, width * 4, pieces=3)),
                column=cols.index(ORTH),
            )
        else:
            embed = norm
@ -300,12 +318,8 @@ def Tok2Vec(width, embed_size, **kwargs):
            ExtractWindow(nW=1)
            >> LN(Maxout(width, width * 3, pieces=cnn_maxout_pieces))
        )
-        tok2vec = (
+        tok2vec = FeatureExtracter(cols) >> with_flatten(
-            FeatureExtracter(cols)
+            embed >> convolution ** conv_depth, pad=conv_depth
            >> with_flatten(
                embed
                >> convolution ** conv_depth, pad=conv_depth
            )
        )
        if bilstm_depth >= 1:
            tok2vec = tok2vec >> PyTorchBiLSTM(width, width, bilstm_depth)
@ -316,7 +330,7 @@ def Tok2Vec(width, embed_size, **kwargs):
 def reapply(layer, n_times):
-    def reapply_fwd(X, drop=0.):
+    def reapply_fwd(X, drop=0.0):
        backprops = []
        for i in range(n_times):
            Y, backprop = layer.begin_update(X, drop=drop)
@ -334,12 +348,14 @@ def reapply(layer, n_times):
            return dX
        return Y, reapply_bwd
    return wrap(reapply_fwd, layer)
 def asarray(ops, dtype):
-    def forward(X, drop=0.):
+    def forward(X, drop=0.0):
        return ops.asarray(X, dtype=dtype), None
    return layerize(forward)
@ -356,7 +372,7 @@ def get_col(idx):
    if idx < 0:
        raise IndexError(Errors.E066.format(value=idx))
-    def forward(X, drop=0.):
+    def forward(X, drop=0.0):
        if isinstance(X, numpy.ndarray):
            ops = NumpyOps()
        else:
@ -377,7 +393,7 @@ def doc2feats(cols=None):
    if cols is None:
        cols = [ID, NORM, PREFIX, SUFFIX, SHAPE, ORTH]
-    def forward(docs, drop=0.):
+    def forward(docs, drop=0.0):
        feats = []
        for doc in docs:
            feats.append(doc.to_array(cols))
@ -389,13 +405,14 @@ def doc2feats(cols=None):
 def print_shape(prefix):
-    def forward(X, drop=0.):
+    def forward(X, drop=0.0):
        return X, lambda dX, **kwargs: dX
    return layerize(forward)
@layerize
-def get_token_vectors(tokens_attrs_vectors, drop=0.):
+def get_token_vectors(tokens_attrs_vectors, drop=0.0):
    tokens, attrs, vectors = tokens_attrs_vectors
    def backward(d_output, sgd=None):
@ -405,14 +422,14 @@ def get_token_vectors(tokens_attrs_vectors, drop=0.):
@layerize
-def logistic(X, drop=0.):
+def logistic(X, drop=0.0):
    xp = get_array_module(X)
    if not isinstance(X, xp.ndarray):
        X = xp.asarray(X)
    # Clip to range (-10, 10)
-    X = xp.minimum(X, 10., X)
+    X = xp.minimum(X, 10.0, X)
-    X = xp.maximum(X, -10., X)
+    X = xp.maximum(X, -10.0, X)
-    Y = 1. / (1. + xp.exp(-X))
+    Y = 1.0 / (1.0 + xp.exp(-X))
    def logistic_bwd(dY, sgd=None):
        dX = dY * (Y * (1 - Y))
@ -424,12 +441,13 @@ def logistic(X, drop=0.):
 def zero_init(model):
    def _zero_init_impl(self, X, y):
        self.W.fill(0)
    model.on_data_hooks.append(_zero_init_impl)
    return model
@layerize
-def preprocess_doc(docs, drop=0.):
+def preprocess_doc(docs, drop=0.0):
    keys = [doc.to_array([LOWER]) for doc in docs]
    ops = Model.ops
    lengths = ops.asarray([arr.shape[0] for arr in keys])
@ -439,31 +457,32 @@ def preprocess_doc(docs, drop=0.):
 def getitem(i):
-    def getitem_fwd(X, drop=0.):
+    def getitem_fwd(X, drop=0.0):
        return X[i], None
    return layerize(getitem_fwd)
 def build_tagger_model(nr_class, **cfg):
-    embed_size = util.env_opt('embed_size', 2000)
+    embed_size = util.env_opt("embed_size", 2000)
-    if 'token_vector_width' in cfg:
+    if "token_vector_width" in cfg:
-        token_vector_width = cfg['token_vector_width']
+        token_vector_width = cfg["token_vector_width"]
    else:
-        token_vector_width = util.env_opt('token_vector_width', 96)
+        token_vector_width = util.env_opt("token_vector_width", 96)
-    pretrained_vectors = cfg.get('pretrained_vectors')
+    pretrained_vectors = cfg.get("pretrained_vectors")
-    subword_features = cfg.get('subword_features', True)
+    subword_features = cfg.get("subword_features", True)
-    with Model.define_operators({'>>': chain, '+': add}):
+    with Model.define_operators({">>": chain, "+": add}):
-        if 'tok2vec' in cfg:
+        if "tok2vec" in cfg:
-            tok2vec = cfg['tok2vec']
+            tok2vec = cfg["tok2vec"]
        else:
-            tok2vec = Tok2Vec(token_vector_width, embed_size,
+            tok2vec = Tok2Vec(
                token_vector_width,
                embed_size,
                subword_features=subword_features,
-                              pretrained_vectors=pretrained_vectors)
+                pretrained_vectors=pretrained_vectors,
        softmax = with_flatten(Softmax(nr_class, token_vector_width))
        model = (
            tok2vec
            >> softmax
            )
        softmax = with_flatten(Softmax(nr_class, token_vector_width))
        model = tok2vec >> softmax
    model.nI = None
    model.tok2vec = tok2vec
    model.softmax = softmax
@ -471,10 +490,10 @@ def build_tagger_model(nr_class, **cfg):
@layerize
-def SpacyVectors(docs, drop=0.):
+def SpacyVectors(docs, drop=0.0):
    batch = []
    for doc in docs:
-        indices = numpy.zeros((len(doc),), dtype='i')
+        indices = numpy.zeros((len(doc),), dtype="i")
        for i, word in enumerate(doc):
            if word.orth in doc.vocab.vectors.key2row:
                indices[i] = doc.vocab.vectors.key2row[word.orth]
@ -486,12 +505,11 @@ def SpacyVectors(docs, drop=0.):
 def build_text_classifier(nr_class, width=64, **cfg):
-    depth = cfg.get('depth', 2)
+    depth = cfg.get("depth", 2)
-    nr_vector = cfg.get('nr_vector', 5000)
+    nr_vector = cfg.get("nr_vector", 5000)
-    pretrained_dims = cfg.get('pretrained_dims', 0)
+    pretrained_dims = cfg.get("pretrained_dims", 0)
-    with Model.define_operators({'>>': chain, '+': add, '|': concatenate,
+    with Model.define_operators({">>": chain, "+": add, "|": concatenate, "**": clone}):
-                                 '**': clone}):
+        if cfg.get("low_data") and pretrained_dims:
        if cfg.get('low_data') and pretrained_dims:
            model = (
                SpacyVectors
                >> flatten_add_lengths
@ -509,21 +527,19 @@ def build_text_classifier(nr_class, width=64, **cfg):
        suffix = HashEmbed(width // 2, nr_vector, column=3)
        shape = HashEmbed(width // 2, nr_vector, column=4)
-        trained_vectors = (
+        trained_vectors = FeatureExtracter(
-            FeatureExtracter([ORTH, LOWER, PREFIX, SUFFIX, SHAPE, ID])
+            [ORTH, LOWER, PREFIX, SUFFIX, SHAPE, ID]
-            >> with_flatten(
+        ) >> with_flatten(
            uniqued(
                (lower | prefix | suffix | shape)
                >> LN(Maxout(width, width + (width // 2) * 3)),
-                    column=0
+                column=0,
                )
            )
        )
        if pretrained_dims:
-            static_vectors = (
+            static_vectors = SpacyVectors >> with_flatten(
-                SpacyVectors
+                Affine(width, pretrained_dims)
                >> with_flatten(Affine(width, pretrained_dims))
            )
            # TODO Make concatenate support lists
            vectors = concatenate_lists(trained_vectors, static_vectors)
@ -532,14 +548,10 @@ def build_text_classifier(nr_class, width=64, **cfg):
            vectors = trained_vectors
            vectors_width = width
            static_vectors = None
-        tok2vec = (
+        tok2vec = vectors >> with_flatten(
            vectors
            >> with_flatten(
            LN(Maxout(width, vectors_width))
-                >> Residual(
+            >> Residual((ExtractWindow(nW=1) >> LN(Maxout(width, width * 3)))) ** depth,
-                    (ExtractWindow(nW=1) >> LN(Maxout(width, width*3)))
+            pad=depth,
                ) ** depth, pad=depth
            )
        )
        cnn_model = (
            tok2vec
@ -550,10 +562,7 @@ def build_text_classifier(nr_class, width=64, **cfg):
            >> zero_init(Affine(nr_class, width, drop_factor=0.0))
        )
-        linear_model = (
+        linear_model = _preprocess_doc >> LinearModel(nr_class)
            _preprocess_doc
            >> LinearModel(nr_class)
        )
        model = (
            (linear_model | cnn_model)
            >> zero_init(Affine(nr_class, nr_class * 2, drop_factor=0.0))
@ -566,9 +575,9 @@ def build_text_classifier(nr_class, width=64, **cfg):
@layerize
-def flatten(seqs, drop=0.):
+def flatten(seqs, drop=0.0):
    ops = Model.ops
-    lengths = ops.asarray([len(seq) for seq in seqs], dtype='i')
+    lengths = ops.asarray([len(seq) for seq in seqs], dtype="i")
    def finish_update(d_X, sgd=None):
        return ops.unflatten(d_X, lengths, pad=0)
@ -583,14 +592,14 @@ def concatenate_lists(*layers, **kwargs):  # pragma: no cover
    """
    if not layers:
        return noop()
-    drop_factor = kwargs.get('drop_factor', 1.0)
+    drop_factor = kwargs.get("drop_factor", 1.0)
    ops = layers[0].ops
    layers = [chain(layer, flatten) for layer in layers]
    concat = concatenate(*layers)
-    def concatenate_lists_fwd(Xs, drop=0.):
+    def concatenate_lists_fwd(Xs, drop=0.0):
        drop *= drop_factor
-        lengths = ops.asarray([len(X) for X in Xs], dtype='i')
+        lengths = ops.asarray([len(X) for X in Xs], dtype="i")
        flat_y, bp_flat_y = concat.begin_update(Xs, drop=drop)
        ys = ops.unflatten(flat_y, lengths)
--- a/spacy/about.py
+++ b/spacy/about.py
@ -1,16 +1,17 @@
 # inspired from:
 # https://python-packaging-user-guide.readthedocs.org/en/latest/single_source_version/
 # https://github.com/pypa/warehouse/blob/master/warehouse/__about__.py
 # fmt: off
-__title__ = 'spacy-nightly'
+__title__ = "spacy-nightly"
-__version__ = '2.1.0a3'
+__version__ = "2.1.0a3"
-__summary__ = 'Industrial-strength Natural Language Processing (NLP) with Python and Cython'
+__summary__ = "Industrial-strength Natural Language Processing (NLP) with Python and Cython"
-__uri__ = 'https://spacy.io'
+__uri__ = "https://spacy.io"
-__author__ = 'Explosion AI'
+__author__ = "Explosion AI"
-__email__ = 'contact@explosion.ai'
+__email__ = "contact@explosion.ai"
-__license__ = 'MIT'
+__license__ = "MIT"
 __release__ = False
-__download_url__ = 'https://github.com/explosion/spacy-models/releases/download'
+__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
-__compatibility__ = 'https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json'
+__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
-__shortcuts__ = 'https://raw.githubusercontent.com/explosion/spacy-models/master/shortcuts-v2.json'
+__shortcuts__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/shortcuts-v2.json"
--- a/spacy/compat.py
+++ b/spacy/compat.py
@ -6,7 +6,6 @@ import sys
 import ujson
 import itertools
 import locale
 import os
 from thinc.neural.util import copy_array
@ -31,9 +30,9 @@ except ImportError:
    cupy = None
 try:
-    from thinc.neural.optimizers import Optimizer
+    from thinc.neural.optimizers import Optimizer  # noqa: F401
 except ImportError:
-    from thinc.neural.optimizers import Adam as Optimizer
+    from thinc.neural.optimizers import Adam as Optimizer  # noqa: F401
 pickle = pickle
 copy_reg = copy_reg
--- a/spacy/displacy/init.py
+++ b/spacy/displacy/init.py
@ -12,8 +12,15 @@ _html = {}
 IS_JUPYTER = is_in_jupyter()
-def render(docs, style='dep', page=False, minify=False, jupyter=IS_JUPYTER,
+def render(
-           options={}, manual=False):
+    docs,
    style="dep",
    page=False,
    minify=False,
    jupyter=IS_JUPYTER,
    options={},
    manual=False,
 ):
    """Render displaCy visualisation.
    docs (list or Doc): Document(s) to visualise.
@ -25,8 +32,10 @@ def render(docs, style='dep', page=False, minify=False, jupyter=IS_JUPYTER,
    manual (bool): Don't parse `Doc` and instead expect a dict/list of dicts.
    RETURNS (unicode): Rendered HTML markup.
    """
-    factories = {'dep': (DependencyRenderer, parse_deps),
+    factories = {
-                 'ent': (EntityRenderer, parse_ents)}
+        "dep": (DependencyRenderer, parse_deps),
        "ent": (EntityRenderer, parse_ents),
    }
    if style not in factories:
        raise ValueError(Errors.E087.format(style=style))
    if isinstance(docs, (Doc, Span, dict)):
@ -37,16 +46,18 @@ def render(docs, style='dep', page=False, minify=False, jupyter=IS_JUPYTER,
    renderer, converter = factories[style]
    renderer = renderer(options=options)
    parsed = [converter(doc, options) for doc in docs] if not manual else docs
-    _html['parsed'] = renderer.render(parsed, page=page, minify=minify).strip()
+    _html["parsed"] = renderer.render(parsed, page=page, minify=minify).strip()
-    html = _html['parsed']
+    html = _html["parsed"]
    if jupyter:  # return HTML rendered by IPython display()
        from IPython.core.display import display, HTML
        return display(HTML(html))
    return html
-def serve(docs, style='dep', page=True, minify=False, options={}, manual=False,
+def serve(
-          port=5000):
+    docs, style="dep", page=True, minify=False, options={}, manual=False, port=5000
 ):
    """Serve displaCy visualisation.
    docs (list or Doc): Document(s) to visualise.
@ -58,11 +69,13 @@ def serve(docs, style='dep', page=True, minify=False, options={}, manual=False,
    port (int): Port to serve visualisation.
    """
    from wsgiref import simple_server
-    render(docs, style=style, page=page, minify=minify, options=options,
+
-           manual=manual)
+    render(docs, style=style, page=page, minify=minify, options=options, manual=manual)
-    httpd = simple_server.make_server('0.0.0.0', port, app)
+    httpd = simple_server.make_server("0.0.0.0", port, app)
-    prints("Using the '{}' visualizer".format(style),
+    prints(
-           title="Serving on port {}...".format(port))
+        "Using the '{}' visualizer".format(style),
        title="Serving on port {}...".format(port),
    )
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
@ -72,11 +85,10 @@ def serve(docs, style='dep', page=True, minify=False, options={}, manual=False,
 def app(environ, start_response):
-    # headers and status need to be bytes in Python 2, see #1227
+    # Headers and status need to be bytes in Python 2, see #1227
-    headers = [(b_to_str(b'Content-type'),
+    headers = [(b_to_str(b"Content-type"), b_to_str(b"text/html; charset=utf-8"))]
-                b_to_str(b'text/html; charset=utf-8'))]
+    start_response(b_to_str(b"200 OK"), headers)
-    start_response(b_to_str(b'200 OK'), headers)
+    res = _html["parsed"].encode(encoding="utf-8")
    res = _html['parsed'].encode(encoding='utf-8')
    return [res]
@ -89,11 +101,10 @@ def parse_deps(orig_doc, options={}):
    doc = Doc(orig_doc.vocab).from_bytes(orig_doc.to_bytes())
    if not doc.is_parsed:
        user_warning(Warnings.W005)
-    if options.get('collapse_phrases', False):
+    if options.get("collapse_phrases", False):
        for np in list(doc.noun_chunks):
-            np.merge(tag=np.root.tag_, lemma=np.root.lemma_,
+            np.merge(tag=np.root.tag_, lemma=np.root.lemma_, ent_type=np.root.ent_type_)
-                    ent_type=np.root.ent_type_)
+    if options.get("collapse_punct", True):
    if options.get('collapse_punct', True):
        spans = []
        for word in doc[:-1]:
            if word.is_punct or not word.nbor(1).is_punct:
@ -103,23 +114,31 @@ def parse_deps(orig_doc, options={}):
            while end < len(doc) and doc[end].is_punct:
                end += 1
            span = doc[start:end]
-            spans.append((span.start_char, span.end_char, word.tag_,
+            spans.append(
-                          word.lemma_, word.ent_type_))
+                (span.start_char, span.end_char, word.tag_, word.lemma_, word.ent_type_)
            )
        for start, end, tag, lemma, ent_type in spans:
            doc.merge(start, end, tag=tag, lemma=lemma, ent_type=ent_type)
-    if options.get('fine_grained'):
+    if options.get("fine_grained"):
-        words = [{'text': w.text, 'tag': w.tag_} for w in doc]
+        words = [{"text": w.text, "tag": w.tag_} for w in doc]
    else:
-        words = [{'text': w.text, 'tag': w.pos_} for w in doc]
+        words = [{"text": w.text, "tag": w.pos_} for w in doc]
    arcs = []
    for word in doc:
        if word.i < word.head.i:
-            arcs.append({'start': word.i, 'end': word.head.i,
+            arcs.append(
-                         'label': word.dep_, 'dir': 'left'})
+                {"start": word.i, "end": word.head.i, "label": word.dep_, "dir": "left"}
            )
        elif word.i > word.head.i:
-            arcs.append({'start': word.head.i, 'end': word.i,
+            arcs.append(
-                         'label': word.dep_, 'dir': 'right'})
+                {
-    return {'words': words, 'arcs': arcs}
+                    "start": word.head.i,
                    "end": word.i,
                    "label": word.dep_,
                    "dir": "right",
                }
            )
    return {"words": words, "arcs": arcs}
 def parse_ents(doc, options={}):
@ -128,10 +147,11 @@ def parse_ents(doc, options={}):
    doc (Doc): Document do parse.
    RETURNS (dict): Generated entities keyed by text (original text) and ents.
    """
-    ents = [{'start': ent.start_char, 'end': ent.end_char, 'label': ent.label_}
+    ents = [
-            for ent in doc.ents]
+        {"start": ent.start_char, "end": ent.end_char, "label": ent.label_}
        for ent in doc.ents
    ]
    if not ents:
        user_warning(Warnings.W006)
-    title = (doc.user_data.get('title', None)
+    title = doc.user_data.get("title", None) if hasattr(doc, "user_data") else None
-             if hasattr(doc, 'user_data') else None)
+    return {"text": doc.text, "ents": ents, "title": title}
    return {'text': doc.text, 'ents': ents, 'title': title}
--- a/spacy/displacy/render.py
+++ b/spacy/displacy/render.py
@ -10,7 +10,8 @@ from ..util import minify_html, escape_html
 class DependencyRenderer(object):
    """Render dependency parses as SVGs."""
-    style = 'dep'
+
    style = "dep"
    def __init__(self, options={}):
        """Initialise dependency renderer.
@ -19,18 +20,16 @@ class DependencyRenderer(object):
            arrow_spacing, arrow_width, arrow_stroke, distance, offset_x,
            color, bg, font)
        """
-        self.compact = options.get('compact', False)
+        self.compact = options.get("compact", False)
-        self.word_spacing = options.get('word_spacing', 45)
+        self.word_spacing = options.get("word_spacing", 45)
-        self.arrow_spacing = options.get('arrow_spacing',
+        self.arrow_spacing = options.get("arrow_spacing", 12 if self.compact else 20)
-                                         12 if self.compact else 20)
+        self.arrow_width = options.get("arrow_width", 6 if self.compact else 10)
-        self.arrow_width = options.get('arrow_width',
+        self.arrow_stroke = options.get("arrow_stroke", 2)
-                                       6 if self.compact else 10)
+        self.distance = options.get("distance", 150 if self.compact else 175)
-        self.arrow_stroke = options.get('arrow_stroke', 2)
+        self.offset_x = options.get("offset_x", 50)
-        self.distance = options.get('distance', 150 if self.compact else 175)
+        self.color = options.get("color", "#000000")
-        self.offset_x = options.get('offset_x', 50)
+        self.bg = options.get("bg", "#ffffff")
-        self.color = options.get('color', '#000000')
+        self.font = options.get("font", "Arial")
        self.bg = options.get('bg', '#ffffff')
        self.font = options.get('font', 'Arial')
    def render(self, parsed, page=False, minify=False):
        """Render complete markup.
@ -43,14 +42,15 @@ class DependencyRenderer(object):
        # Create a random ID prefix to make sure parses don't receive the
        # same ID, even if they're identical
        id_prefix = random.randint(0, 999)
-        rendered = [self.render_svg('{}-{}'.format(id_prefix, i), p['words'], p['arcs'])
+        rendered = [
-                    for i, p in enumerate(parsed)]
+            self.render_svg("{}-{}".format(id_prefix, i), p["words"], p["arcs"])
            for i, p in enumerate(parsed)
        ]
        if page:
-            content = ''.join([TPL_FIGURE.format(content=svg)
+            content = "".join([TPL_FIGURE.format(content=svg) for svg in rendered])
                               for svg in rendered])
            markup = TPL_PAGE.format(content=content)
        else:
-            markup = ''.join(rendered)
+            markup = "".join(rendered)
        if minify:
            return minify_html(markup)
        return markup
@ -69,15 +69,21 @@ class DependencyRenderer(object):
        self.width = self.offset_x + len(words) * self.distance
        self.height = self.offset_y + 3 * self.word_spacing
        self.id = render_id
-        words = [self.render_word(w['text'], w['tag'], i)
+        words = [self.render_word(w["text"], w["tag"], i) for i, w in enumerate(words)]
-                 for i, w in enumerate(words)]
+        arcs = [
-        arcs = [self.render_arrow(a['label'], a['start'],
+            self.render_arrow(a["label"], a["start"], a["end"], a["dir"], i)
-                                  a['end'], a['dir'], i)
+            for i, a in enumerate(arcs)
-                for i, a in enumerate(arcs)]
+        ]
-        content = ''.join(words) + ''.join(arcs)
+        content = "".join(words) + "".join(arcs)
-        return TPL_DEP_SVG.format(id=self.id, width=self.width,
+        return TPL_DEP_SVG.format(
-                                  height=self.height, color=self.color,
+            id=self.id,
-                                  bg=self.bg, font=self.font, content=content)
+            width=self.width,
            height=self.height,
            color=self.color,
            bg=self.bg,
            font=self.font,
            content=content,
        )
    def render_word(self, text, tag, i):
        """Render individual word.
@ -92,7 +98,6 @@ class DependencyRenderer(object):
        html_text = escape_html(text)
        return TPL_DEP_WORDS.format(text=html_text, tag=tag, x=x, y=y)
    def render_arrow(self, label, start, end, direction, i):
        """Render indivicual arrow.
@ -106,8 +111,12 @@ class DependencyRenderer(object):
        level = self.levels.index(end - start) + 1
        x_start = self.offset_x + start * self.distance + self.arrow_spacing
        y = self.offset_y
-        x_end = (self.offset_x+(end-start)*self.distance+start*self.distance
+        x_end = (
-                 - self.arrow_spacing*(self.highest_level-level)/4)
+            self.offset_x
            + (end - start) * self.distance
            + start * self.distance
            - self.arrow_spacing * (self.highest_level - level) / 4
        )
        y_curve = self.offset_y - level * self.distance / 2
        if self.compact:
            y_curve = self.offset_y - level * self.distance / 6
@ -115,8 +124,14 @@ class DependencyRenderer(object):
            y_curve = -self.distance
        arrowhead = self.get_arrowhead(direction, x_start, y, x_end)
        arc = self.get_arc(x_start, y, y_curve, x_end)
-        return TPL_DEP_ARCS.format(id=self.id, i=i, stroke=self.arrow_stroke,
+        return TPL_DEP_ARCS.format(
-                                   head=arrowhead, label=label, arc=arc)
+            id=self.id,
            i=i,
            stroke=self.arrow_stroke,
            head=arrowhead,
            label=label,
            arc=arc,
        )
    def get_arc(self, x_start, y, y_curve, x_end):
        """Render individual arc.
@ -141,13 +156,22 @@ class DependencyRenderer(object):
        end (int): X-coordinate of arrow end point.
        RETURNS (unicode): Definition of the arrow head path ('d' attribute).
        """
-        if direction == 'left':
+        if direction == "left":
            pos1, pos2, pos3 = (x, x - self.arrow_width + 2, x + self.arrow_width - 2)
        else:
-            pos1, pos2, pos3 = (end, end+self.arrow_width-2,
+            pos1, pos2, pos3 = (
-                                end-self.arrow_width+2)
+                end,
-        arrowhead = (pos1, y+2, pos2, y-self.arrow_width, pos3,
+                end + self.arrow_width - 2,
-                     y-self.arrow_width)
+                end - self.arrow_width + 2,
            )
        arrowhead = (
            pos1,
            y + 2,
            pos2,
            y - self.arrow_width,
            pos3,
            y - self.arrow_width,
        )
        return "M{},{} L{},{} {},{}".format(*arrowhead)
    def get_levels(self, arcs):
@ -157,30 +181,44 @@ class DependencyRenderer(object):
        args (list): Individual arcs and their start, end, direction and label.
        RETURNS (list): Arc levels sorted from lowest to highest.
        """
-        levels = set(map(lambda arc: arc['end'] - arc['start'], arcs))
+        levels = set(map(lambda arc: arc["end"] - arc["start"], arcs))
        return sorted(list(levels))
 class EntityRenderer(object):
    """Render named entities as HTML."""
-    style = 'ent'
+
    style = "ent"
    def __init__(self, options={}):
        """Initialise dependency renderer.
        options (dict): Visualiser-specific options (colors, ents)
        """
-        colors = {'ORG': '#7aecec', 'PRODUCT': '#bfeeb7', 'GPE': '#feca74',
+        colors = {
-                  'LOC': '#ff9561', 'PERSON': '#aa9cfc', 'NORP': '#c887fb',
+            "ORG": "#7aecec",
-                  'FACILITY': '#9cc9cc', 'EVENT': '#ffeb80', 'LAW': '#ff8197',
+            "PRODUCT": "#bfeeb7",
-                  'LANGUAGE': '#ff8197', 'WORK_OF_ART': '#f0d0ff',
+            "GPE": "#feca74",
-                  'DATE': '#bfe1d9', 'TIME': '#bfe1d9', 'MONEY': '#e4e7d2',
+            "LOC": "#ff9561",
-                  'QUANTITY': '#e4e7d2', 'ORDINAL': '#e4e7d2',
+            "PERSON": "#aa9cfc",
-                  'CARDINAL': '#e4e7d2', 'PERCENT': '#e4e7d2'}
+            "NORP": "#c887fb",
-        colors.update(options.get('colors', {}))
+            "FACILITY": "#9cc9cc",
-        self.default_color = '#ddd'
+            "EVENT": "#ffeb80",
            "LAW": "#ff8197",
            "LANGUAGE": "#ff8197",
            "WORK_OF_ART": "#f0d0ff",
            "DATE": "#bfe1d9",
            "TIME": "#bfe1d9",
            "MONEY": "#e4e7d2",
            "QUANTITY": "#e4e7d2",
            "ORDINAL": "#e4e7d2",
            "CARDINAL": "#e4e7d2",
            "PERCENT": "#e4e7d2",
        }
        colors.update(options.get("colors", {}))
        self.default_color = "#ddd"
        self.colors = colors
-        self.ents = options.get('ents', None)
+        self.ents = options.get("ents", None)
    def render(self, parsed, page=False, minify=False):
        """Render complete markup.
@ -190,14 +228,14 @@ class EntityRenderer(object):
        minify (bool): Minify HTML markup.
        RETURNS (unicode): Rendered HTML markup.
        """
-        rendered = [self.render_ents(p['text'], p['ents'],
+        rendered = [
-                    p.get('title', None)) for p in parsed]
+            self.render_ents(p["text"], p["ents"], p.get("title", None)) for p in parsed
        ]
        if page:
-            docs = ''.join([TPL_FIGURE.format(content=doc)
+            docs = "".join([TPL_FIGURE.format(content=doc) for doc in rendered])
                            for doc in rendered])
            markup = TPL_PAGE.format(content=docs)
        else:
-            markup = ''.join(rendered)
+            markup = "".join(rendered)
        if minify:
            return minify_html(markup)
        return markup
@ -209,18 +247,18 @@ class EntityRenderer(object):
        spans (list): Individual entity spans and their start, end and label.
        title (unicode or None): Document title set in Doc.user_data['title'].
        """
-        markup = ''
+        markup = ""
        offset = 0
        for span in spans:
-            label = span['label']
+            label = span["label"]
-            start = span['start']
+            start = span["start"]
-            end = span['end']
+            end = span["end"]
            entity = text[start:end]
-            fragments = text[offset:start].split('\n')
+            fragments = text[offset:start].split("\n")
            for i, fragment in enumerate(fragments):
                markup += fragment
                if len(fragments) > 1 and i != len(fragments) - 1:
-                    markup += '</br>'
+                    markup += "</br>"
            if self.ents is None or label.upper() in self.ents:
                color = self.colors.get(label.upper(), self.default_color)
                markup += TPL_ENT.format(label=label, text=entity, bg=color)
--- a/spacy/displacy/templates.py
+++ b/spacy/displacy/templates.py
@ -2,7 +2,7 @@
 from __future__ import unicode_literals
-# setting explicit height and max-width: none on the SVG is required for
+# Setting explicit height and max-width: none on the SVG is required for
 # Jupyter to render it properly in a cell
 TPL_DEP_SVG = """
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -8,13 +8,17 @@ import inspect
 def add_codes(err_cls):
    """Add error codes to string messages via class attribute names."""
    class ErrorsWithCodes(object):
        def __getattribute__(self, code):
            msg = getattr(err_cls, code)
-            return '[{code}] {msg}'.format(code=code, msg=msg)
+            return "[{code}] {msg}".format(code=code, msg=msg)
    return ErrorsWithCodes()
 # fmt: off
@add_codes
 class Warnings(object):
    W001 = ("As of spaCy v2.0, the keyword argument `path=` is deprecated. "
@ -275,6 +279,7 @@ class Errors(object):
            " can only be part of one entity, so make sure the entities you're "
            "setting don't overlap.")
@add_codes
 class TempErrors(object):
    T001 = ("Max length currently 10 for phrase matching")
@ -292,55 +297,57 @@ class TempErrors(object):
            "(pretrained_dims) but not the new name (pretrained_vectors).")
 # fmt: on
 class ModelsWarning(UserWarning):
    pass
 WARNINGS = {
-    'user': UserWarning,
+    "user": UserWarning,
-    'deprecation': DeprecationWarning,
+    "deprecation": DeprecationWarning,
-    'models': ModelsWarning,
+    "models": ModelsWarning,
 }
 def _get_warn_types(arg):
-    if arg == '':  # don't show any warnings
+    if arg == "":  # don't show any warnings
        return []
-    if not arg or arg == 'all':  # show all available warnings
+    if not arg or arg == "all":  # show all available warnings
        return WARNINGS.keys()
-    return [w_type.strip() for w_type in arg.split(',')
+    return [w_type.strip() for w_type in arg.split(",") if w_type.strip() in WARNINGS]
            if w_type.strip() in WARNINGS]
 def _get_warn_excl(arg):
    if not arg:
        return []
-    return [w_id.strip() for w_id in arg.split(',')]
+    return [w_id.strip() for w_id in arg.split(",")]
-SPACY_WARNING_FILTER = os.environ.get('SPACY_WARNING_FILTER')
+SPACY_WARNING_FILTER = os.environ.get("SPACY_WARNING_FILTER")
-SPACY_WARNING_TYPES = _get_warn_types(os.environ.get('SPACY_WARNING_TYPES'))
+SPACY_WARNING_TYPES = _get_warn_types(os.environ.get("SPACY_WARNING_TYPES"))
-SPACY_WARNING_IGNORE = _get_warn_excl(os.environ.get('SPACY_WARNING_IGNORE'))
+SPACY_WARNING_IGNORE = _get_warn_excl(os.environ.get("SPACY_WARNING_IGNORE"))
 def user_warning(message):
-    _warn(message, 'user')
+    _warn(message, "user")
 def deprecation_warning(message):
-    _warn(message, 'deprecation')
+    _warn(message, "deprecation")
 def models_warning(message):
-    _warn(message, 'models')
+    _warn(message, "models")
-def _warn(message, warn_type='user'):
+def _warn(message, warn_type="user"):
    """
    message (unicode): The message to display.
    category (Warning): The Warning to show.
    """
-    w_id = message.split('[', 1)[1].split(']', 1)[0]  # get ID from string
+    w_id = message.split("[", 1)[1].split("]", 1)[0]  # get ID from string
    if warn_type in SPACY_WARNING_TYPES and w_id not in SPACY_WARNING_IGNORE:
        category = WARNINGS[warn_type]
        stack = inspect.stack()[-1]
--- a/spacy/glossary.py
+++ b/spacy/glossary.py
@ -21,295 +21,272 @@ GLOSSARY = {
    # POS tags
    # Universal POS Tags
    # http://universaldependencies.org/u/pos/
-
+    "ADJ": "adjective",
-    'ADJ':          'adjective',
+    "ADP": "adposition",
-    'ADP':          'adposition',
+    "ADV": "adverb",
-    'ADV':          'adverb',
+    "AUX": "auxiliary",
-    'AUX':          'auxiliary',
+    "CONJ": "conjunction",
-    'CONJ':         'conjunction',
+    "CCONJ": "coordinating conjunction",
-    'CCONJ':        'coordinating conjunction',
+    "DET": "determiner",
-    'DET':          'determiner',
+    "INTJ": "interjection",
-    'INTJ':         'interjection',
+    "NOUN": "noun",
-    'NOUN':         'noun',
+    "NUM": "numeral",
-    'NUM':          'numeral',
+    "PART": "particle",
-    'PART':         'particle',
+    "PRON": "pronoun",
-    'PRON':         'pronoun',
+    "PROPN": "proper noun",
-    'PROPN':        'proper noun',
+    "PUNCT": "punctuation",
-    'PUNCT':        'punctuation',
+    "SCONJ": "subordinating conjunction",
-    'SCONJ':        'subordinating conjunction',
+    "SYM": "symbol",
-    'SYM':          'symbol',
+    "VERB": "verb",
-    'VERB':         'verb',
+    "X": "other",
-    'X':            'other',
+    "EOL": "end of line",
-    'EOL':          'end of line',
+    "SPACE": "space",
    'SPACE':        'space',
    # POS tags (English)
    # OntoNotes 5 / Penn Treebank
    # https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
-
+    ".": "punctuation mark, sentence closer",
-    '.':            'punctuation mark, sentence closer',
+    ",": "punctuation mark, comma",
-    ',':            'punctuation mark, comma',
+    "-LRB-": "left round bracket",
-    '-LRB-':        'left round bracket',
+    "-RRB-": "right round bracket",
-    '-RRB-':        'right round bracket',
+    "``": "opening quotation mark",
-    '``':           'opening quotation mark',
+    '""': "closing quotation mark",
-    '""':           'closing quotation mark',
+    "''": "closing quotation mark",
-    "''":           'closing quotation mark',
+    ":": "punctuation mark, colon or ellipsis",
-    ':':            'punctuation mark, colon or ellipsis',
+    "$": "symbol, currency",
-    '$':            'symbol, currency',
+    "#": "symbol, number sign",
-    '#':            'symbol, number sign',
+    "AFX": "affix",
-    'AFX':          'affix',
+    "CC": "conjunction, coordinating",
-    'CC':           'conjunction, coordinating',
+    "CD": "cardinal number",
-    'CD':           'cardinal number',
+    "DT": "determiner",
-    'DT':           'determiner',
+    "EX": "existential there",
-    'EX':           'existential there',
+    "FW": "foreign word",
-    'FW':           'foreign word',
+    "HYPH": "punctuation mark, hyphen",
-    'HYPH':         'punctuation mark, hyphen',
+    "IN": "conjunction, subordinating or preposition",
-    'IN':           'conjunction, subordinating or preposition',
+    "JJ": "adjective",
-    'JJ':           'adjective',
+    "JJR": "adjective, comparative",
-    'JJR':          'adjective, comparative',
+    "JJS": "adjective, superlative",
-    'JJS':          'adjective, superlative',
+    "LS": "list item marker",
-    'LS':           'list item marker',
+    "MD": "verb, modal auxiliary",
-    'MD':           'verb, modal auxiliary',
+    "NIL": "missing tag",
-    'NIL':          'missing tag',
+    "NN": "noun, singular or mass",
-    'NN':           'noun, singular or mass',
+    "NNP": "noun, proper singular",
-    'NNP':          'noun, proper singular',
+    "NNPS": "noun, proper plural",
-    'NNPS':         'noun, proper plural',
+    "NNS": "noun, plural",
-    'NNS':          'noun, plural',
+    "PDT": "predeterminer",
-    'PDT':          'predeterminer',
+    "POS": "possessive ending",
-    'POS':          'possessive ending',
+    "PRP": "pronoun, personal",
-    'PRP':          'pronoun, personal',
+    "PRP$": "pronoun, possessive",
-    'PRP$':         'pronoun, possessive',
+    "RB": "adverb",
-    'RB':           'adverb',
+    "RBR": "adverb, comparative",
-    'RBR':          'adverb, comparative',
+    "RBS": "adverb, superlative",
-    'RBS':          'adverb, superlative',
+    "RP": "adverb, particle",
-    'RP':           'adverb, particle',
+    "TO": "infinitival to",
-    'TO':           'infinitival to',
+    "UH": "interjection",
-    'UH':           'interjection',
+    "VB": "verb, base form",
-    'VB':           'verb, base form',
+    "VBD": "verb, past tense",
-    'VBD':          'verb, past tense',
+    "VBG": "verb, gerund or present participle",
-    'VBG':          'verb, gerund or present participle',
+    "VBN": "verb, past participle",
-    'VBN':          'verb, past participle',
+    "VBP": "verb, non-3rd person singular present",
-    'VBP':          'verb, non-3rd person singular present',
+    "VBZ": "verb, 3rd person singular present",
-    'VBZ':          'verb, 3rd person singular present',
+    "WDT": "wh-determiner",
-    'WDT':          'wh-determiner',
+    "WP": "wh-pronoun, personal",
-    'WP':           'wh-pronoun, personal',
+    "WP$": "wh-pronoun, possessive",
-    'WP$':          'wh-pronoun, possessive',
+    "WRB": "wh-adverb",
-    'WRB':          'wh-adverb',
+    "SP": "space",
-    'SP':           'space',
+    "ADD": "email",
-    'ADD':          'email',
+    "NFP": "superfluous punctuation",
-    'NFP':          'superfluous punctuation',
+    "GW": "additional word in multi-word expression",
-    'GW':           'additional word in multi-word expression',
+    "XX": "unknown",
-    'XX':           'unknown',
+    "BES": 'auxiliary "be"',
-    'BES':          'auxiliary "be"',
+    "HVS": 'forms of "have"',
    'HVS':          'forms of "have"',
    # POS Tags (German)
    # TIGER Treebank
    # http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_introduction.pdf
-
+    "$(": "other sentence-internal punctuation mark",
-    '$(':           'other sentence-internal punctuation mark',
+    "$,": "comma",
-    '$,':           'comma',
+    "$.": "sentence-final punctuation mark",
-    '$.':           'sentence-final punctuation mark',
+    "ADJA": "adjective, attributive",
-    'ADJA':         'adjective, attributive',
+    "ADJD": "adjective, adverbial or predicative",
-    'ADJD':         'adjective, adverbial or predicative',
+    "APPO": "postposition",
-    'APPO':         'postposition',
+    "APPR": "preposition; circumposition left",
-    'APPR':         'preposition; circumposition left',
+    "APPRART": "preposition with article",
-    'APPRART':      'preposition with article',
+    "APZR": "circumposition right",
-    'APZR':         'circumposition right',
+    "ART": "definite or indefinite article",
-    'ART':          'definite or indefinite article',
+    "CARD": "cardinal number",
-    'CARD':         'cardinal number',
+    "FM": "foreign language material",
-    'FM':           'foreign language material',
+    "ITJ": "interjection",
-    'ITJ':          'interjection',
+    "KOKOM": "comparative conjunction",
-    'KOKOM':        'comparative conjunction',
+    "KON": "coordinate conjunction",
-    'KON':          'coordinate conjunction',
+    "KOUI": 'subordinate conjunction with "zu" and infinitive',
-    'KOUI':         'subordinate conjunction with "zu" and infinitive',
+    "KOUS": "subordinate conjunction with sentence",
-    'KOUS':         'subordinate conjunction with sentence',
+    "NE": "proper noun",
-    'NE':           'proper noun',
+    "NNE": "proper noun",
-    'NNE':          'proper noun',
+    "PAV": "pronominal adverb",
-    'PAV':          'pronominal adverb',
+    "PROAV": "pronominal adverb",
-    'PROAV':        'pronominal adverb',
+    "PDAT": "attributive demonstrative pronoun",
-    'PDAT':         'attributive demonstrative pronoun',
+    "PDS": "substituting demonstrative pronoun",
-    'PDS':          'substituting demonstrative pronoun',
+    "PIAT": "attributive indefinite pronoun without determiner",
-    'PIAT':         'attributive indefinite pronoun without determiner',
+    "PIDAT": "attributive indefinite pronoun with determiner",
-    'PIDAT':        'attributive indefinite pronoun with determiner',
+    "PIS": "substituting indefinite pronoun",
-    'PIS':          'substituting indefinite pronoun',
+    "PPER": "non-reflexive personal pronoun",
-    'PPER':         'non-reflexive personal pronoun',
+    "PPOSAT": "attributive possessive pronoun",
-    'PPOSAT':       'attributive possessive pronoun',
+    "PPOSS": "substituting possessive pronoun",
-    'PPOSS':        'substituting possessive pronoun',
+    "PRELAT": "attributive relative pronoun",
-    'PRELAT':       'attributive relative pronoun',
+    "PRELS": "substituting relative pronoun",
-    'PRELS':        'substituting relative pronoun',
+    "PRF": "reflexive personal pronoun",
-    'PRF':          'reflexive personal pronoun',
+    "PTKA": "particle with adjective or adverb",
-    'PTKA':         'particle with adjective or adverb',
+    "PTKANT": "answer particle",
-    'PTKANT':       'answer particle',
+    "PTKNEG": "negative particle",
-    'PTKNEG':       'negative particle',
+    "PTKVZ": "separable verbal particle",
-    'PTKVZ':        'separable verbal particle',
+    "PTKZU": '"zu" before infinitive',
-    'PTKZU':        '"zu" before infinitive',
+    "PWAT": "attributive interrogative pronoun",
-    'PWAT':         'attributive interrogative pronoun',
+    "PWAV": "adverbial interrogative or relative pronoun",
-    'PWAV':         'adverbial interrogative or relative pronoun',
+    "PWS": "substituting interrogative pronoun",
-    'PWS':          'substituting interrogative pronoun',
+    "TRUNC": "word remnant",
-    'TRUNC':        'word remnant',
+    "VAFIN": "finite verb, auxiliary",
-    'VAFIN':        'finite verb, auxiliary',
+    "VAIMP": "imperative, auxiliary",
-    'VAIMP':        'imperative, auxiliary',
+    "VAINF": "infinitive, auxiliary",
-    'VAINF':        'infinitive, auxiliary',
+    "VAPP": "perfect participle, auxiliary",
-    'VAPP':         'perfect participle, auxiliary',
+    "VMFIN": "finite verb, modal",
-    'VMFIN':        'finite verb, modal',
+    "VMINF": "infinitive, modal",
-    'VMINF':        'infinitive, modal',
+    "VMPP": "perfect participle, modal",
-    'VMPP':         'perfect participle, modal',
+    "VVFIN": "finite verb, full",
-    'VVFIN':        'finite verb, full',
+    "VVIMP": "imperative, full",
-    'VVIMP':        'imperative, full',
+    "VVINF": "infinitive, full",
-    'VVINF':        'infinitive, full',
+    "VVIZU": 'infinitive with "zu", full',
-    'VVIZU':        'infinitive with "zu", full',
+    "VVPP": "perfect participle, full",
-    'VVPP':         'perfect participle, full',
+    "XY": "non-word containing non-letter",
    'XY':           'non-word containing non-letter',
    # Noun chunks
-
+    "NP": "noun phrase",
-    'NP':           'noun phrase',
+    "PP": "prepositional phrase",
-    'PP':           'prepositional phrase',
+    "VP": "verb phrase",
-    'VP':           'verb phrase',
+    "ADVP": "adverb phrase",
-    'ADVP':         'adverb phrase',
+    "ADJP": "adjective phrase",
-    'ADJP':         'adjective phrase',
+    "SBAR": "subordinating conjunction",
-    'SBAR':         'subordinating conjunction',
+    "PRT": "particle",
-    'PRT':          'particle',
+    "PNP": "prepositional noun phrase",
    'PNP':          'prepositional noun phrase',
    # Dependency Labels (English)
    # ClearNLP / Universal Dependencies
    # https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md
-
+    "acomp": "adjectival complement",
-    'acomp':        'adjectival complement',
+    "advcl": "adverbial clause modifier",
-    'advcl':        'adverbial clause modifier',
+    "advmod": "adverbial modifier",
-    'advmod':       'adverbial modifier',
+    "agent": "agent",
-    'agent':        'agent',
+    "amod": "adjectival modifier",
-    'amod':         'adjectival modifier',
+    "appos": "appositional modifier",
-    'appos':        'appositional modifier',
+    "attr": "attribute",
-    'attr':         'attribute',
+    "aux": "auxiliary",
-    'aux':          'auxiliary',
+    "auxpass": "auxiliary (passive)",
-    'auxpass':      'auxiliary (passive)',
+    "cc": "coordinating conjunction",
-    'cc':           'coordinating conjunction',
+    "ccomp": "clausal complement",
-    'ccomp':        'clausal complement',
+    "complm": "complementizer",
-    'complm':       'complementizer',
+    "conj": "conjunct",
-    'conj':         'conjunct',
+    "cop": "copula",
-    'cop':          'copula',
+    "csubj": "clausal subject",
-    'csubj':        'clausal subject',
+    "csubjpass": "clausal subject (passive)",
-    'csubjpass':    'clausal subject (passive)',
+    "dep": "unclassified dependent",
-    'dep':          'unclassified dependent',
+    "det": "determiner",
-    'det':          'determiner',
+    "dobj": "direct object",
-    'dobj':         'direct object',
+    "expl": "expletive",
-    'expl':         'expletive',
+    "hmod": "modifier in hyphenation",
-    'hmod':         'modifier in hyphenation',
+    "hyph": "hyphen",
-    'hyph':         'hyphen',
+    "infmod": "infinitival modifier",
-    'infmod':       'infinitival modifier',
+    "intj": "interjection",
-    'intj':         'interjection',
+    "iobj": "indirect object",
-    'iobj':         'indirect object',
+    "mark": "marker",
-    'mark':         'marker',
+    "meta": "meta modifier",
-    'meta':         'meta modifier',
+    "neg": "negation modifier",
-    'neg':          'negation modifier',
+    "nmod": "modifier of nominal",
-    'nmod':         'modifier of nominal',
+    "nn": "noun compound modifier",
-    'nn':           'noun compound modifier',
+    "npadvmod": "noun phrase as adverbial modifier",
-    'npadvmod':     'noun phrase as adverbial modifier',
+    "nsubj": "nominal subject",
-    'nsubj':        'nominal subject',
+    "nsubjpass": "nominal subject (passive)",
-    'nsubjpass':    'nominal subject (passive)',
+    "num": "number modifier",
-    'num':          'number modifier',
+    "number": "number compound modifier",
-    'number':       'number compound modifier',
+    "oprd": "object predicate",
-    'oprd':         'object predicate',
+    "obj": "object",
-    'obj':          'object',
+    "obl": "oblique nominal",
-    'obl':          'oblique nominal',
+    "parataxis": "parataxis",
-    'parataxis':    'parataxis',
+    "partmod": "participal modifier",
-    'partmod':      'participal modifier',
+    "pcomp": "complement of preposition",
-    'pcomp':        'complement of preposition',
+    "pobj": "object of preposition",
-    'pobj':         'object of preposition',
+    "poss": "possession modifier",
-    'poss':         'possession modifier',
+    "possessive": "possessive modifier",
-    'possessive':   'possessive modifier',
+    "preconj": "pre-correlative conjunction",
-    'preconj':      'pre-correlative conjunction',
+    "prep": "prepositional modifier",
-    'prep':         'prepositional modifier',
+    "prt": "particle",
-    'prt':          'particle',
+    "punct": "punctuation",
-    'punct':        'punctuation',
+    "quantmod": "modifier of quantifier",
-    'quantmod':     'modifier of quantifier',
+    "rcmod": "relative clause modifier",
-    'rcmod':        'relative clause modifier',
+    "root": "root",
-    'root':         'root',
+    "xcomp": "open clausal complement",
    'xcomp':        'open clausal complement',
    # Dependency labels (German)
    # TIGER Treebank
    # http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_introduction.pdf
    # currently missing: 'cc' (comparative complement) because of conflict
    # with English labels
-
+    "ac": "adpositional case marker",
-    'ac':           'adpositional case marker',
+    "adc": "adjective component",
-    'adc':          'adjective component',
+    "ag": "genitive attribute",
-    'ag':           'genitive attribute',
+    "ams": "measure argument of adjective",
-    'ams':          'measure argument of adjective',
+    "app": "apposition",
-    'app':          'apposition',
+    "avc": "adverbial phrase component",
-    'avc':          'adverbial phrase component',
+    "cd": "coordinating conjunction",
-    'cd':           'coordinating conjunction',
+    "cj": "conjunct",
-    'cj':           'conjunct',
+    "cm": "comparative conjunction",
-    'cm':           'comparative conjunction',
+    "cp": "complementizer",
-    'cp':           'complementizer',
+    "cvc": "collocational verb construction",
-    'cvc':          'collocational verb construction',
+    "da": "dative",
-    'da':           'dative',
+    "dh": "discourse-level head",
-    'dh':           'discourse-level head',
+    "dm": "discourse marker",
-    'dm':           'discourse marker',
+    "ep": "expletive es",
-    'ep':           'expletive es',
+    "hd": "head",
-    'hd':           'head',
+    "ju": "junctor",
-    'ju':           'junctor',
+    "mnr": "postnominal modifier",
-    'mnr':          'postnominal modifier',
+    "mo": "modifier",
-    'mo':           'modifier',
+    "ng": "negation",
-    'ng':           'negation',
+    "nk": "noun kernel element",
-    'nk':           'noun kernel element',
+    "nmc": "numerical component",
-    'nmc':          'numerical component',
+    "oa": "accusative object",
-    'oa':           'accusative object',
+    "oc": "clausal object",
-    'oc':           'clausal object',
+    "og": "genitive object",
-    'og':           'genitive object',
+    "op": "prepositional object",
-    'op':           'prepositional object',
+    "par": "parenthetical element",
-    'par':          'parenthetical element',
+    "pd": "predicate",
-    'pd':           'predicate',
+    "pg": "phrasal genitive",
-    'pg':           'phrasal genitive',
+    "ph": "placeholder",
-    'ph':           'placeholder',
+    "pm": "morphological particle",
-    'pm':           'morphological particle',
+    "pnc": "proper noun component",
-    'pnc':          'proper noun component',
+    "rc": "relative clause",
-    'rc':           'relative clause',
+    "re": "repeated element",
-    're':           'repeated element',
+    "rs": "reported speech",
-    'rs':           'reported speech',
+    "sb": "subject",
    'sb':           'subject',
    # Named Entity Recognition
    # OntoNotes 5
    # https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf
-
+    "PERSON": "People, including fictional",
-    'PERSON':       'People, including fictional',
+    "NORP": "Nationalities or religious or political groups",
-    'NORP':         'Nationalities or religious or political groups',
+    "FACILITY": "Buildings, airports, highways, bridges, etc.",
-    'FACILITY':     'Buildings, airports, highways, bridges, etc.',
+    "FAC": "Buildings, airports, highways, bridges, etc.",
-    'FAC':          'Buildings, airports, highways, bridges, etc.',
+    "ORG": "Companies, agencies, institutions, etc.",
-    'ORG':          'Companies, agencies, institutions, etc.',
+    "GPE": "Countries, cities, states",
-    'GPE':          'Countries, cities, states',
+    "LOC": "Non-GPE locations, mountain ranges, bodies of water",
-    'LOC':          'Non-GPE locations, mountain ranges, bodies of water',
+    "PRODUCT": "Objects, vehicles, foods, etc. (not services)",
-    'PRODUCT':      'Objects, vehicles, foods, etc. (not services)',
+    "EVENT": "Named hurricanes, battles, wars, sports events, etc.",
-    'EVENT':        'Named hurricanes, battles, wars, sports events, etc.',
+    "WORK_OF_ART": "Titles of books, songs, etc.",
-    'WORK_OF_ART':  'Titles of books, songs, etc.',
+    "LAW": "Named documents made into laws.",
-    'LAW':          'Named documents made into laws.',
+    "LANGUAGE": "Any named language",
-    'LANGUAGE':     'Any named language',
+    "DATE": "Absolute or relative dates or periods",
-    'DATE':         'Absolute or relative dates or periods',
+    "TIME": "Times smaller than a day",
-    'TIME':         'Times smaller than a day',
+    "PERCENT": 'Percentage, including "%"',
-    'PERCENT':      'Percentage, including "%"',
+    "MONEY": "Monetary values, including unit",
-    'MONEY':        'Monetary values, including unit',
+    "QUANTITY": "Measurements, as of weight or distance",
-    'QUANTITY':     'Measurements, as of weight or distance',
+    "ORDINAL": '"first", "second", etc.',
-    'ORDINAL':      '"first", "second", etc.',
+    "CARDINAL": "Numerals that do not fall under another type",
    'CARDINAL':     'Numerals that do not fall under another type',
    # Named Entity Recognition
    # Wikipedia
    # http://www.sciencedirect.com/science/article/pii/S0004370212000276
    # https://pdfs.semanticscholar.org/5744/578cc243d92287f47448870bb426c66cc941.pdf
-
+    "PER": "Named person or family.",
-    'PER':          'Named person or family.',
+    "MISC": "Miscellaneous entities, e.g. events, nationalities, products or works of art",
    'MISC':         ('Miscellaneous entities, e.g. events, nationalities, '
                     'products or works of art'),
 }
--- a/spacy/lang/ar/init.py
+++ b/spacy/lang/ar/init.py
@ -16,16 +16,18 @@ from ...util import update_exc, add_lookups
 class ArabicDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
-    lex_attr_getters[LANG] = lambda text: 'ar'
+    lex_attr_getters[LANG] = lambda text: "ar"
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
+    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    suffixes = TOKENIZER_SUFFIXES
 class Arabic(Language):
-    lang = 'ar'
+    lang = "ar"
    Defaults = ArabicDefaults
-__all__ = ['Arabic']
+__all__ = ["Arabic"]
--- a/spacy/lang/ar/examples.py
+++ b/spacy/lang/ar/examples.py
@ -10,11 +10,11 @@ Example sentences to test spaCy and its language models.
 sentences = [
    "نال الكاتب خالد توفيق  جائزة الرواية العربية في معرض الشارقة الدولي للكتاب",
-    "أين تقع دمشق ؟"
+    "أين تقع دمشق ؟",
    "كيف حالك ؟",
    "هل يمكن ان نلتقي على الساعة الثانية عشرة ظهرا ؟",
    "ماهي أبرز التطورات السياسية، الأمنية والاجتماعية في العالم ؟",
    "هل بالإمكان أن نلتقي غدا؟",
    "هناك نحو 382 مليون شخص مصاب بداء السكَّري في العالم",
-    "كشفت دراسة حديثة أن الخيل تقرأ تعبيرات الوجه وتستطيع أن تتذكر مشاعر الناس وعواطفهم"
+    "كشفت دراسة حديثة أن الخيل تقرأ تعبيرات الوجه وتستطيع أن تتذكر مشاعر الناس وعواطفهم",
 ]
--- a/spacy/lang/ar/lex_attrs.py
+++ b/spacy/lang/ar/lex_attrs.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
 from ...attrs import LIKE_NUM
-_num_words = set("""
+_num_words = set(
    """
 صفر
 واحد
 إثنان
@ -52,9 +53,11 @@ _num_words = set("""
 مليون
 مليار
 مليارات
-""".split())
+""".split()
 )
-_ordinal_words = set("""
+_ordinal_words = set(
    """
 اول
 أول
 حاد
@ -69,20 +72,21 @@ _ordinal_words = set("""
 ثامن
 تاسع
 عاشر
-""".split())
+""".split()
 )
 def like_num(text):
    """
-    check if text resembles a number
+    Check if text resembles a number
    """
-    if text.startswith(('+', '-', '±', '~')):
+    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
-    text = text.replace(',', '').replace('.', '')
+    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
-    if text.count('/') == 1:
+    if text.count("/") == 1:
-        num, denom = text.split('/')
+        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    if text in _num_words:
@ -92,6 +96,4 @@ def like_num(text):
    return False
-LEX_ATTRS = {
+LEX_ATTRS = {LIKE_NUM: like_num}
    LIKE_NUM: like_num
 }
--- a/spacy/lang/ar/punctuation.py
+++ b/spacy/lang/ar/punctuation.py
@ -1,15 +1,20 @@
 # coding: utf8
 from __future__ import unicode_literals
 from ..punctuation import TOKENIZER_INFIXES
 from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES, CURRENCY
-from ..char_classes import QUOTES, UNITS, ALPHA, ALPHA_LOWER, ALPHA_UPPER
+from ..char_classes import UNITS, ALPHA_UPPER
-_suffixes = (LIST_PUNCT + LIST_ELLIPSES + LIST_QUOTES +
+_suffixes = (
-             [r'(?<=[0-9])\+',
+    LIST_PUNCT
    + LIST_ELLIPSES
    + LIST_QUOTES
    + [
        r"(?<=[0-9])\+",
        # Arabic is written from Right-To-Left
-              r'(?<=[0-9])(?:{})'.format(CURRENCY),
+        r"(?<=[0-9])(?:{})".format(CURRENCY),
-              r'(?<=[0-9])(?:{})'.format(UNITS),
+        r"(?<=[0-9])(?:{})".format(UNITS),
-              r'(?<=[{au}][{au}])\.'.format(au=ALPHA_UPPER)])
+        r"(?<=[{au}][{au}])\.".format(au=ALPHA_UPPER),
    ]
 )
 TOKENIZER_SUFFIXES = _suffixes
--- a/spacy/lang/ar/stop_words.py
+++ b/spacy/lang/ar/stop_words.py
@ -1,7 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-STOP_WORDS = set("""
+STOP_WORDS = set(
    """
 من
 نحو
 لعل
@ -388,4 +389,5 @@ STOP_WORDS = set("""
 وإن
 ولو
 يا
-""".split())
+""".split()
 )
--- a/spacy/lang/ar/tokenizer_exceptions.py
+++ b/spacy/lang/ar/tokenizer_exceptions.py
@ -1,21 +1,23 @@
 # coding: utf8
 from __future__ import unicode_literals
-from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
+from ...symbols import ORTH, LEMMA
-import re
+
 _exc = {}
-# time
+
 # Time
 for exc_data in [
    {LEMMA: "قبل الميلاد", ORTH: "ق.م"},
    {LEMMA: "بعد الميلاد", ORTH: "ب. م"},
    {LEMMA: "ميلادي", ORTH: ".م"},
    {LEMMA: "هجري", ORTH: ".هـ"},
-    {LEMMA: "توفي", ORTH: ".ت"}]:
+    {LEMMA: "توفي", ORTH: ".ت"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
-# scientific abv.
+# Scientific abv.
 for exc_data in [
    {LEMMA: "صلى الله عليه وسلم", ORTH: "صلعم"},
    {LEMMA: "الشارح", ORTH: "الشـ"},
@ -28,20 +30,20 @@ for exc_data in [
    {LEMMA: "أنبأنا", ORTH: "أنا"},
    {LEMMA: "أخبرنا", ORTH: "نا"},
    {LEMMA: "مصدر سابق", ORTH: "م. س"},
-    {LEMMA: "مصدر نفسه", ORTH: "م. ن"}]:
+    {LEMMA: "مصدر نفسه", ORTH: "م. ن"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
-# other abv.
+# Other abv.
 for exc_data in [
    {LEMMA: "دكتور", ORTH: "د."},
    {LEMMA: "أستاذ دكتور", ORTH: "أ.د"},
    {LEMMA: "أستاذ", ORTH: "أ."},
-    {LEMMA: "بروفيسور", ORTH: "ب."}]:
+    {LEMMA: "بروفيسور", ORTH: "ب."},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
-for exc_data in [
+for exc_data in [{LEMMA: "تلفون", ORTH: "ت."}, {LEMMA: "صندوق بريد", ORTH: "ص.ب"}]:
    {LEMMA: "تلفون", ORTH: "ت."},
    {LEMMA: "صندوق بريد", ORTH: "ص.ب"}]:
    _exc[exc_data[ORTH]] = [exc_data]
 TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/bn/init.py
+++ b/spacy/lang/bn/init.py
@ -15,7 +15,7 @@ from ...util import update_exc
 class BengaliDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: 'bn'
+    lex_attr_getters[LANG] = lambda text: "bn"
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
@ -26,8 +26,8 @@ class BengaliDefaults(Language.Defaults):
 class Bengali(Language):
-    lang = 'bn'
+    lang = "bn"
    Defaults = BengaliDefaults
-__all__ = ['Bengali']
+__all__ = ["Bengali"]
--- a/spacy/lang/bn/lemmatizer.py
+++ b/spacy/lang/bn/lemmatizer.py
@ -13,11 +13,9 @@ LEMMA_RULES = {
        ["গাছা", ""],
        ["গাছি", ""],
        ["ছড়া", ""],
        ["কে", ""],
        ["ে", ""],
        ["তে", ""],
        ["র", ""],
        ["রা", ""],
        ["রে", ""],
@ -28,7 +26,6 @@ LEMMA_RULES = {
        ["গুলা", ""],
        ["গুলো", ""],
        ["গুলি", ""],
        ["কুল", ""],
        ["গণ", ""],
        ["দল", ""],
@ -45,7 +42,6 @@ LEMMA_RULES = {
        ["সকল", ""],
        ["মহল", ""],
        ["াবলি", ""],  # আবলি
        # Bengali digit representations
        ["০", "0"],
        ["১", "1"],
@ -58,11 +54,5 @@ LEMMA_RULES = {
        ["৮", "8"],
        ["৯", "9"],
    ],
-
+    "punct": [["“", '"'], ["”", '"'], ["\u2018", "'"], ["\u2019", "'"]],
    "punct": [
        ["“", "\""],
        ["”", "\""],
        ["\u2018", "'"],
        ["\u2019", "'"]
    ]
 }
--- a/spacy/lang/bn/morph_rules.py
+++ b/spacy/lang/bn/morph_rules.py
@ -6,63 +6,252 @@ from ...symbols import LEMMA, PRON_LEMMA
 MORPH_RULES = {
    "PRP": {
-        'ঐ':         {LEMMA: PRON_LEMMA, 'PronType': 'Dem'},
+        "ঐ": {LEMMA: PRON_LEMMA, "PronType": "Dem"},
-        'আমাকে':     {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'One', 'PronType': 'Prs', 'Case': 'Acc'},
+        "আমাকে": {
-        'কি':        {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Gender': 'Neut', 'PronType': 'Int', 'Case': 'Acc'},
+            LEMMA: PRON_LEMMA,
-        'সে':        {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'Three', 'PronType': 'Prs', 'Case': 'Nom'},
+            "Number": "Sing",
-        'কিসে':      {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Gender': 'Neut', 'PronType': 'Int', 'Case': 'Acc'},
+            "Person": "One",
-        'তাকে':      {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'Three', 'PronType': 'Prs', 'Case': 'Acc'},
+            "PronType": "Prs",
-        'স্বয়ং':     {LEMMA: PRON_LEMMA, 'Reflex': 'Yes', 'PronType': 'Ref'},
+            "Case": "Acc",
-        'কোনগুলো':   {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Gender': 'Neut', 'PronType': 'Int', 'Case': 'Acc'},
+        },
-        'তুমি':      {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Nom'},
+        "কি": {
-        'তুই':      {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Nom'},
+            LEMMA: PRON_LEMMA,
-        'তাদেরকে':   {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'Three', 'PronType': 'Prs', 'Case': 'Acc'},
+            "Number": "Sing",
-        'আমরা':      {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'One ', 'PronType': 'Prs', 'Case': 'Nom'},
+            "Gender": "Neut",
-        'যিনি':      {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'PronType': 'Rel', 'Case': 'Nom'},
+            "PronType": "Int",
-        'আমাদেরকে':  {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'One', 'PronType': 'Prs', 'Case': 'Acc'},
+            "Case": "Acc",
-        'কোন':       {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'PronType': 'Int', 'Case': 'Acc'},
+        },
-        'কারা':      {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'PronType': 'Int', 'Case': 'Acc'},
+        "সে": {
-        'তোমাকে':    {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Acc'},
+            LEMMA: PRON_LEMMA,
-        'তোকে':    {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Acc'},
+            "Number": "Sing",
-        'খোদ':       {LEMMA: PRON_LEMMA, 'Reflex': 'Yes', 'PronType': 'Ref'},
+            "Person": "Three",
-        'কে':        {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'PronType': 'Int', 'Case': 'Acc'},
+            "PronType": "Prs",
-        'যারা':      {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'PronType': 'Rel', 'Case': 'Nom'},
+            "Case": "Nom",
-        'যে':        {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'PronType': 'Rel', 'Case': 'Nom'},
+        },
-        'তোমরা':     {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Nom'},
+        "কিসে": {
-        'তোরা':     {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Nom'},
+            LEMMA: PRON_LEMMA,
-        'তোমাদেরকে': {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Acc'},
+            "Number": "Sing",
-        'তোদেরকে': {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'Two', 'PronType': 'Prs', 'Case': 'Acc'},
+            "Gender": "Neut",
-        'আপন':       {LEMMA: PRON_LEMMA, 'Reflex': 'Yes', 'PronType': 'Ref'},
+            "PronType": "Int",
-        'এ':         {LEMMA: PRON_LEMMA, 'PronType': 'Dem'},
+            "Case": "Acc",
-        'নিজ':       {LEMMA: PRON_LEMMA, 'Reflex': 'Yes', 'PronType': 'Ref'},
+        },
-        'কার':       {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'PronType': 'Int', 'Case': 'Acc'},
+        "তাকে": {
-        'যা':        {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Gender': 'Neut', 'PronType': 'Rel', 'Case': 'Nom'},
+            LEMMA: PRON_LEMMA,
-        'তারা':      {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'Person': 'Three', 'PronType': 'Prs', 'Case': 'Nom'},
+            "Number": "Sing",
-        'আমি':       {LEMMA: PRON_LEMMA, 'Number': 'Sing', 'Person': 'One', 'PronType': 'Prs', 'Case': 'Nom'}
+            "Person": "Three",
            "PronType": "Prs",
            "Case": "Acc",
        },
        "স্বয়ং": {LEMMA: PRON_LEMMA, "Reflex": "Yes", "PronType": "Ref"},
        "কোনগুলো": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Gender": "Neut",
            "PronType": "Int",
            "Case": "Acc",
        },
        "তুমি": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Nom",
        },
        "তুই": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Nom",
        },
        "তাদেরকে": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Three",
            "PronType": "Prs",
            "Case": "Acc",
        },
        "আমরা": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "One ",
            "PronType": "Prs",
            "Case": "Nom",
        },
        "যিনি": {LEMMA: PRON_LEMMA, "Number": "Sing", "PronType": "Rel", "Case": "Nom"},
        "আমাদেরকে": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "One",
            "PronType": "Prs",
            "Case": "Acc",
        },
        "কোন": {LEMMA: PRON_LEMMA, "Number": "Sing", "PronType": "Int", "Case": "Acc"},
        "কারা": {LEMMA: PRON_LEMMA, "Number": "Plur", "PronType": "Int", "Case": "Acc"},
        "তোমাকে": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Acc",
        },
        "তোকে": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Acc",
        },
        "খোদ": {LEMMA: PRON_LEMMA, "Reflex": "Yes", "PronType": "Ref"},
        "কে": {LEMMA: PRON_LEMMA, "Number": "Sing", "PronType": "Int", "Case": "Acc"},
        "যারা": {LEMMA: PRON_LEMMA, "Number": "Plur", "PronType": "Rel", "Case": "Nom"},
        "যে": {LEMMA: PRON_LEMMA, "Number": "Sing", "PronType": "Rel", "Case": "Nom"},
        "তোমরা": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Nom",
        },
        "তোরা": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Nom",
        },
        "তোমাদেরকে": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Acc",
        },
        "তোদেরকে": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Two",
            "PronType": "Prs",
            "Case": "Acc",
        },
        "আপন": {LEMMA: PRON_LEMMA, "Reflex": "Yes", "PronType": "Ref"},
        "এ": {LEMMA: PRON_LEMMA, "PronType": "Dem"},
        "নিজ": {LEMMA: PRON_LEMMA, "Reflex": "Yes", "PronType": "Ref"},
        "কার": {LEMMA: PRON_LEMMA, "Number": "Sing", "PronType": "Int", "Case": "Acc"},
        "যা": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Gender": "Neut",
            "PronType": "Rel",
            "Case": "Nom",
        },
        "তারা": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Three",
            "PronType": "Prs",
            "Case": "Nom",
        },
        "আমি": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "One",
            "PronType": "Prs",
            "Case": "Nom",
        },
    },
    "PRP$": {
-
+        "আমার": {
-        'আমার':    {LEMMA:  PRON_LEMMA, 'Number': 'Sing', 'Person': 'One', 'PronType': 'Prs', 'Poss': 'Yes',
+            LEMMA: PRON_LEMMA,
-                    'Case': 'Nom'},
+            "Number": "Sing",
-        'মোর':     {LEMMA:  PRON_LEMMA, 'Number': 'Sing', 'Person': 'One', 'PronType': 'Prs', 'Poss': 'Yes',
+            "Person": "One",
-                    'Case': 'Nom'},
+            "PronType": "Prs",
-        'মোদের':   {LEMMA:  PRON_LEMMA, 'Number': 'Plur', 'Person': 'One', 'PronType': 'Prs', 'Poss': 'Yes',
+            "Poss": "Yes",
-                    'Case': 'Nom'},
+            "Case": "Nom",
-        'তার':     {LEMMA:  PRON_LEMMA, 'Number': 'Sing', 'Person': 'Three', 'PronType': 'Prs', 'Poss': 'Yes',
+        },
-                    'Case': 'Nom'},
+        "মোর": {
-        'তোমাদের': {LEMMA:  PRON_LEMMA, 'Number': 'Plur', 'Person': 'Two', 'PronType': 'Prs', 'Poss': 'Yes',
+            LEMMA: PRON_LEMMA,
-                    'Case': 'Nom'},
+            "Number": "Sing",
-        'আমাদের':  {LEMMA:  PRON_LEMMA, 'Number': 'Plur', 'Person': 'One', 'PronType': 'Prs', 'Poss': 'Yes',
+            "Person": "One",
-                    'Case': 'Nom'},
+            "PronType": "Prs",
-        'তোমার':   {LEMMA:  PRON_LEMMA, 'Number': 'Sing', 'Person': 'Two', 'PronType': 'Prs', 'Poss': 'Yes',
+            "Poss": "Yes",
-                    'Case': 'Nom'},
+            "Case": "Nom",
-        'তোর':     {LEMMA:  PRON_LEMMA, 'Number': 'Sing', 'Person': 'Two', 'PronType': 'Prs', 'Poss': 'Yes',
+        },
-                    'Case': 'Nom'},
+        "মোদের": {
-        'তাদের':   {LEMMA:  PRON_LEMMA, 'Number': 'Plur', 'Person': 'Three', 'PronType': 'Prs', 'Poss': 'Yes',
+            LEMMA: PRON_LEMMA,
-                    'Case': 'Nom'},
+            "Number": "Plur",
-        'কাদের':   {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'PronType': 'Int', 'Case': 'Acc'},
+            "Person": "One",
-        'তোদের':   {LEMMA:  PRON_LEMMA, 'Number': 'Plur', 'Person': 'Two', 'PronType': 'Prs', 'Poss': 'Yes',
+            "PronType": "Prs",
-                    'Case': 'Nom'},
+            "Poss": "Yes",
-        'যাদের':   {LEMMA: PRON_LEMMA, 'Number': 'Plur', 'PronType': 'Int', 'Case': 'Acc'},
+            "Case": "Nom",
-    }
+        },
        "তার": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "Three",
            "PronType": "Prs",
            "Poss": "Yes",
            "Case": "Nom",
        },
        "তোমাদের": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Two",
            "PronType": "Prs",
            "Poss": "Yes",
            "Case": "Nom",
        },
        "আমাদের": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "One",
            "PronType": "Prs",
            "Poss": "Yes",
            "Case": "Nom",
        },
        "তোমার": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "Two",
            "PronType": "Prs",
            "Poss": "Yes",
            "Case": "Nom",
        },
        "তোর": {
            LEMMA: PRON_LEMMA,
            "Number": "Sing",
            "Person": "Two",
            "PronType": "Prs",
            "Poss": "Yes",
            "Case": "Nom",
        },
        "তাদের": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Three",
            "PronType": "Prs",
            "Poss": "Yes",
            "Case": "Nom",
        },
        "কাদের": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "PronType": "Int",
            "Case": "Acc",
        },
        "তোদের": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "Person": "Two",
            "PronType": "Prs",
            "Poss": "Yes",
            "Case": "Nom",
        },
        "যাদের": {
            LEMMA: PRON_LEMMA,
            "Number": "Plur",
            "PronType": "Int",
            "Case": "Acc",
        },
    },
 }
--- a/spacy/lang/bn/punctuation.py
+++ b/spacy/lang/bn/punctuation.py
@ -2,29 +2,45 @@
 from __future__ import unicode_literals
 from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES, LIST_ICONS
-from ..char_classes import ALPHA_LOWER, ALPHA_UPPER, ALPHA, HYPHENS, QUOTES, UNITS
+from ..char_classes import ALPHA_LOWER, ALPHA, HYPHENS, QUOTES, UNITS
 _currency = r"\$|¢|£|€|¥|฿|৳"
-_quotes = QUOTES.replace("'", '')
+_quotes = QUOTES.replace("'", "")
-_list_punct = LIST_PUNCT + '। ॥'.strip().split()
+_list_punct = LIST_PUNCT + "। ॥".strip().split()
-_prefixes = ([r'\+'] + _list_punct + LIST_ELLIPSES + LIST_QUOTES + LIST_ICONS)
+_prefixes = [r"\+"] + _list_punct + LIST_ELLIPSES + LIST_QUOTES + LIST_ICONS
-_suffixes = (_list_punct + LIST_ELLIPSES + LIST_QUOTES + LIST_ICONS +
+_suffixes = (
-             [r'(?<=[0-9])\+',
+    _list_punct
-              r'(?<=°[FfCcKk])\.',
+    + LIST_ELLIPSES
-              r'(?<=[0-9])(?:{})'.format(_currency),
+    + LIST_QUOTES
-              r'(?<=[0-9])(?:{})'.format(UNITS),
+    + LIST_ICONS
-              r'(?<=[{}(?:{})])\.'.format('|'.join([ALPHA_LOWER, r'%²\-\)\]\+', QUOTES]), _currency)])
+    + [
        r"(?<=[0-9])\+",
        r"(?<=°[FfCcKk])\.",
        r"(?<=[0-9])(?:{})".format(_currency),
        r"(?<=[0-9])(?:{})".format(UNITS),
        r"(?<=[{}(?:{})])\.".format(
            "|".join([ALPHA_LOWER, r"%²\-\)\]\+", QUOTES]), _currency
        ),
    ]
 )
-_infixes = (LIST_ELLIPSES + LIST_ICONS +
+_infixes = (
-            [r'(?<=[0-9{zero}-{nine}])[+\-\*^=](?=[0-9{zero}-{nine}-])'.format(zero=u'০', nine=u'৯'),
+    LIST_ELLIPSES
-             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
+    + LIST_ICONS
-             r'(?<=[{a}])[{h}](?={ae})'.format(a=ALPHA, h=HYPHENS, ae=u'এ'),
+    + [
        r"(?<=[0-9{zero}-{nine}])[+\-\*^=](?=[0-9{zero}-{nine}-])".format(
            zero="০", nine="৯"
        ),
        r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
        r"(?<=[{a}])[{h}](?={ae})".format(a=ALPHA, h=HYPHENS, ae="এ"),
        r'(?<=[{a}])[?";:=,.]*(?:{h})(?=[{a}])'.format(a=ALPHA, h=HYPHENS),
-             r'(?<=[{a}"])[:<>=/](?=[{a}])'.format(a=ALPHA)])
+        r'(?<=[{a}"])[:<>=/](?=[{a}])'.format(a=ALPHA),
    ]
 )
 TOKENIZER_PREFIXES = _prefixes
--- a/spacy/lang/bn/stop_words.py
+++ b/spacy/lang/bn/stop_words.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-STOP_WORDS = set("""
+STOP_WORDS = set(
    """
 অতএব অথচ অথবা অনুযায়ী অনেক অনেকে অনেকেই অন্তত  অবধি অবশ্য অর্থাৎ অন্য অনুযায়ী অর্ধভাগে
 আগামী আগে আগেই আছে আজ আদ্যভাগে আপনার আপনি আবার আমরা আমাকে আমাদের আমার  আমি আর আরও
 ইত্যাদি ইহা
@ -41,4 +42,5 @@ STOP_WORDS = set("""
 সাধারণ সামনে সঙ্গে সঙ্গেও সব সবার সমস্ত সম্প্রতি সময় সহ সহিত সাথে সুতরাং সে  সেই সেখান সেখানে  সেটা সেটাই সেটাও সেটি স্পষ্ট স্বয়ং
 হইতে হইবে হইয়া হওয়া হওয়ায় হওয়ার হচ্ছে হত হতে হতেই হন হবে হবেন হয় হয়তো হয়নি হয়ে হয়েই হয়েছিল হয়েছে হাজার
 হয়েছেন হল হলে হলেই হলেও হলো হিসাবে হিসেবে হৈলে হোক হয় হয়ে হয়েছে হৈতে হইয়া  হয়েছিল হয়েছেন হয়নি হয়েই হয়তো হওয়া হওয়ার হওয়ায়
-""".split())
+""".split()
 )
--- a/spacy/lang/bn/tag_map.py
+++ b/spacy/lang/bn/tag_map.py
@ -11,7 +11,7 @@ TAG_MAP = {
    "-LRB-": {POS: PUNCT, "PunctType": "brck", "PunctSide": "ini"},
    "-RRB-": {POS: PUNCT, "PunctType": "brck", "PunctSide": "fin"},
    "``": {POS: PUNCT, "PunctType": "quot", "PunctSide": "ini"},
-    "\"\"":     {POS: PUNCT, "PunctType": "quot", "PunctSide": "fin"},
+    '""': {POS: PUNCT, "PunctType": "quot", "PunctSide": "fin"},
    "''": {POS: PUNCT, "PunctType": "quot", "PunctSide": "fin"},
    ":": {POS: PUNCT},
    "৳": {POS: SYM, "Other": {"SymType": "currency"}},
@ -42,7 +42,6 @@ TAG_MAP = {
    "RBR": {POS: ADV, "Degree": "comp"},
    "RBS": {POS: ADV, "Degree": "sup"},
    "RP": {POS: PART},
    "SYM":      {POS: SYM},
    "TO": {POS: PART, "PartType": "inf", "VerbForm": "inf"},
    "UH": {POS: INTJ},
    "VB": {POS: VERB, "VerbForm": "inf"},
@ -50,7 +49,13 @@ TAG_MAP = {
    "VBG": {POS: VERB, "VerbForm": "part", "Tense": "pres", "Aspect": "prog"},
    "VBN": {POS: VERB, "VerbForm": "part", "Tense": "past", "Aspect": "perf"},
    "VBP": {POS: VERB, "VerbForm": "fin", "Tense": "pres"},
-    "VBZ":      {POS: VERB, "VerbForm": "fin", "Tense": "pres", "Number": "sing", "Person": 3},
+    "VBZ": {
        POS: VERB,
        "VerbForm": "fin",
        "Tense": "pres",
        "Number": "sing",
        "Person": 3,
    },
    "WDT": {POS: ADJ, "PronType": "int|rel"},
    "WP": {POS: NOUN, "PronType": "int|rel"},
    "WP$": {POS: ADJ, "Poss": "yes", "PronType": "int|rel"},
--- a/spacy/lang/bn/tokenizer_exceptions.py
+++ b/spacy/lang/bn/tokenizer_exceptions.py
@ -19,7 +19,8 @@ for exc_data in [
    {ORTH: "কি.মি", LEMMA: "কিলোমিটার"},
    {ORTH: "সে.মি.", LEMMA: "সেন্টিমিটার"},
    {ORTH: "সে.মি", LEMMA: "সেন্টিমিটার"},
-    {ORTH: "মি.লি.", LEMMA: "মিলিলিটার"}]:
+    {ORTH: "মি.লি.", LEMMA: "মিলিলিটার"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
--- a/spacy/lang/ca/init.py
+++ b/spacy/lang/ca/init.py
@ -4,13 +4,6 @@ from __future__ import unicode_literals
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
 # uncomment if files are available
 # from .norm_exceptions import NORM_EXCEPTIONS
 # from .tag_map import TAG_MAP
 # from .morph_rules import MORPH_RULES
 # uncomment if lookup-based lemmatizer is available
 from .lemmatizer import LOOKUP
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
@ -19,46 +12,22 @@ from ...language import Language
 from ...attrs import LANG, NORM
 from ...util import update_exc, add_lookups
 # Create a Language subclass
 # Documentation: https://spacy.io/docs/usage/adding-languages
 # This file should be placed in spacy/lang/ca (ISO code of language).
 # Before submitting a pull request, make sure the remove all comments from the
 # language data files, and run at least the basic tokenizer tests. Simply add the
 # language ID to the list of languages in spacy/tests/conftest.py to include it
 # in the basic tokenizer sanity tests. You can optionally add a fixture for the
 # language's tokenizer and add more specific tests. For more info, see the
 # tests documentation: https://github.com/explosion/spaCy/tree/master/spacy/tests
 class CatalanDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: 'ca' # ISO code
+    lex_attr_getters[LANG] = lambda text: "ca"
-    # add more norm exception dictionaries here
+    lex_attr_getters[NORM] = add_lookups(
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
+        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
-
+    )
    # overwrite functions for lexical attributes
    lex_attr_getters.update(LEX_ATTRS)
    # add custom tokenizer exceptions to base exceptions
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    # add stop words
    stop_words = STOP_WORDS
    # if available: add tag map
    # tag_map = dict(TAG_MAP)
    # if available: add morph rules
    # morph_rules = dict(MORPH_RULES)
    lemma_lookup = LOOKUP
 class Catalan(Language):
-    lang = 'ca' # ISO code
+    lang = "ca"
-    Defaults = CatalanDefaults # set Defaults to custom language defaults
+    Defaults = CatalanDefaults
-# set default export – this allows the language class to be lazy-loaded
+__all__ = ["Catalan"]
 __all__ = ['Catalan']
--- a/spacy/lang/ca/examples.py
+++ b/spacy/lang/ca/examples.py
@ -5,7 +5,7 @@ from __future__ import unicode_literals
 """
 Example sentences to test spaCy and its language models.
->>> from spacy.lang.es.examples import sentences
+>>> from spacy.lang.ca.examples import sentences
 >>> docs = nlp.pipe(sentences)
 """
--- a/spacy/lang/ca/lex_attrs.py
+++ b/spacy/lang/ca/lex_attrs.py
@ -1,33 +1,57 @@
 # coding: utf8
 from __future__ import unicode_literals
 # import the symbols for the attrs you want to overwrite
 from ...attrs import LIKE_NUM
-# Overwriting functions for lexical attributes
+_num_words = [
-# Documentation: https://localhost:1234/docs/usage/adding-languages#lex-attrs
+    "zero",
-# Most of these functions, like is_lower or like_url should be language-
+    "un",
-# independent. Others, like like_num (which includes both digits and number
+    "dos",
-# words), requires customisation.
+    "tres",
-
+    "quatre",
-
+    "cinc",
-# Example: check if token resembles a number
+    "sis",
-
+    "set",
-_num_words = ['zero', 'un', 'dos', 'tres', 'quatre', 'cinc', 'sis', 'set',
+    "vuit",
-              'vuit', 'nou', 'deu', 'onze', 'dotze', 'tretze', 'catorze',
+    "nou",
-              'quinze', 'setze', 'disset', 'divuit', 'dinou', 'vint',
+    "deu",
-              'trenta', 'quaranta', 'cinquanta', 'seixanta', 'setanta', 'vuitanta', 'noranta',
+    "onze",
-              'cent', 'mil', 'milió', 'bilió', 'trilió', 'quatrilió',
+    "dotze",
-              'gazilió', 'bazilió']
+    "tretze",
    "catorze",
    "quinze",
    "setze",
    "disset",
    "divuit",
    "dinou",
    "vint",
    "trenta",
    "quaranta",
    "cinquanta",
    "seixanta",
    "setanta",
    "vuitanta",
    "noranta",
    "cent",
    "mil",
    "milió",
    "bilió",
    "trilió",
    "quatrilió",
    "gazilió",
    "bazilió",
 ]
 def like_num(text):
-    text = text.replace(',', '').replace('.', '')
+    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
-    if text.count('/') == 1:
+    if text.count("/") == 1:
-        num, denom = text.split('/')
+        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    if text in _num_words:
@ -35,9 +59,4 @@ def like_num(text):
    return False
-# Create dictionary of functions to overwrite. The default lex_attr_getters are
+LEX_ATTRS = {LIKE_NUM: like_num}
 # updated with this one, so only the functions defined here are overwritten.
 LEX_ATTRS = {
    LIKE_NUM: like_num
 }
--- a/spacy/lang/ca/stop_words.py
+++ b/spacy/lang/ca/stop_words.py
@ -2,9 +2,8 @@
 from __future__ import unicode_literals
-# Stop words
+STOP_WORDS = set(
-
+    """
 STOP_WORDS = set("""
 a abans ací ah així això al aleshores algun alguna algunes alguns alhora allà allí allò
 als altra altre altres amb ambdues ambdós anar ans apa aquell aquella aquelles aquells
 aquest aquesta aquestes aquests aquí
@ -53,4 +52,5 @@ un una unes uns us últim ús
 va vaig vam van vas veu vosaltres vostra vostre vostres
-""".split())
+""".split()
 )
--- a/spacy/lang/ca/tag_map.py
+++ b/spacy/lang/ca/tag_map.py
@ -5,14 +5,6 @@ from ..symbols import POS, ADV, NOUN, ADP, PRON, SCONJ, PROPN, DET, SYM, INTJ
 from ..symbols import PUNCT, NUM, AUX, X, CONJ, ADJ, VERB, PART, SPACE, CCONJ
 # Add a tag map
 # Documentation: https://spacy.io/docs/usage/adding-languages#tag-map
 # Universal Dependencies: http://universaldependencies.org/u/pos/all.html
 # The keys of the tag map should be strings in your tag set. The dictionary must
 # have an entry POS whose value is one of the Universal Dependencies tags.
 # Optionally, you can also include morphological features or other attributes.
 TAG_MAP = {
    "ADV": {POS: ADV},
    "NOUN": {POS: NOUN},
@ -32,5 +24,5 @@ TAG_MAP = {
    "ADJ": {POS: ADJ},
    "VERB": {POS: VERB},
    "PART": {POS: PART},
-    "SP":     	{POS: SPACE}
+    "SP": {POS: SPACE},
 }
--- a/spacy/lang/ca/tokenizer_exceptions.py
+++ b/spacy/lang/ca/tokenizer_exceptions.py
@ -1,8 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals
-# import symbols – if you need to use more, add them here
+from ...symbols import ORTH, LEMMA
 from ...symbols import ORTH, LEMMA, TAG, NORM, ADP, DET
 _exc = {}
@ -25,27 +24,18 @@ for exc_data in [
    {ORTH: "Srta.", LEMMA: "senyoreta"},
    {ORTH: "núm", LEMMA: "número"},
    {ORTH: "St.", LEMMA: "sant"},
-    {ORTH: "Sta.", LEMMA: "santa"}]:
+    {ORTH: "Sta.", LEMMA: "santa"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
 # Times
-
+_exc["12m."] = [{ORTH: "12"}, {ORTH: "m.", LEMMA: "p.m."}]
 _exc["12m."] = [
    {ORTH: "12"},
    {ORTH: "m.", LEMMA: "p.m."}]
 for h in range(1, 12 + 1):
    for period in ["a.m.", "am"]:
-        _exc["%d%s" % (h, period)] = [
+        _exc["%d%s" % (h, period)] = [{ORTH: "%d" % h}, {ORTH: period, LEMMA: "a.m."}]
            {ORTH: "%d" % h},
            {ORTH: period, LEMMA: "a.m."}]
    for period in ["p.m.", "pm"]:
-        _exc["%d%s" % (h, period)] = [
+        _exc["%d%s" % (h, period)] = [{ORTH: "%d" % h}, {ORTH: period, LEMMA: "p.m."}]
            {ORTH: "%d" % h},
            {ORTH: period, LEMMA: "p.m."}]
 # To keep things clean and readable, it's recommended to only declare the
 # TOKENIZER_EXCEPTIONS at the bottom:
 TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/char_classes.py
+++ b/spacy/lang/char_classes.py
@ -4,23 +4,23 @@ from __future__ import unicode_literals
 import regex as re
 re.DEFAULT_VERSION = re.VERSION1
-merge_char_classes = lambda classes: '[{}]'.format('||'.join(classes))
+merge_char_classes = lambda classes: "[{}]".format("||".join(classes))
-split_chars = lambda char: list(char.strip().split(' '))
+split_chars = lambda char: list(char.strip().split(" "))
-merge_chars = lambda char: char.strip().replace(' ', '|')
+merge_chars = lambda char: char.strip().replace(" ", "|")
-_bengali = r'[\p{L}&&\p{Bengali}]'
+_bengali = r"[\p{L}&&\p{Bengali}]"
-_hebrew = r'[\p{L}&&\p{Hebrew}]'
+_hebrew = r"[\p{L}&&\p{Hebrew}]"
-_latin_lower = r'[\p{Ll}&&\p{Latin}]'
+_latin_lower = r"[\p{Ll}&&\p{Latin}]"
-_latin_upper = r'[\p{Lu}&&\p{Latin}]'
+_latin_upper = r"[\p{Lu}&&\p{Latin}]"
-_latin = r'[[\p{Ll}||\p{Lu}]&&\p{Latin}]'
+_latin = r"[[\p{Ll}||\p{Lu}]&&\p{Latin}]"
-_persian = r'[\p{L}&&\p{Arabic}]'
+_persian = r"[\p{L}&&\p{Arabic}]"
-_russian_lower = r'[ёа-я]'
+_russian_lower = r"[ёа-я]"
-_russian_upper = r'[ЁА-Я]'
+_russian_upper = r"[ЁА-Я]"
-_sinhala = r'[\p{L}&&\p{Sinhala}]'
+_sinhala = r"[\p{L}&&\p{Sinhala}]"
-_tatar_lower = r'[әөүҗңһ]'
+_tatar_lower = r"[әөүҗңһ]"
-_tatar_upper = r'[ӘӨҮҖҢҺ]'
+_tatar_upper = r"[ӘӨҮҖҢҺ]"
-_greek_lower = r'[α-ωάέίόώήύ]'
+_greek_lower = r"[α-ωάέίόώήύ]"
-_greek_upper = r'[Α-ΩΆΈΊΌΏΉΎ]'
+_greek_upper = r"[Α-ΩΆΈΊΌΏΉΎ]"
 _upper = [_latin_upper, _russian_upper, _tatar_upper, _greek_upper]
 _lower = [_latin_lower, _russian_lower, _tatar_lower, _greek_lower]
@ -30,23 +30,27 @@ ALPHA = merge_char_classes(_upper + _lower + _uncased)
 ALPHA_LOWER = merge_char_classes(_lower + _uncased)
 ALPHA_UPPER = merge_char_classes(_upper + _uncased)
-_units = ('km km² km³ m m² m³ dm dm² dm³ cm cm² cm³ mm mm² mm³ ha µm nm yd in ft '
+_units = (
-          'kg g mg µg t lb oz m/s km/h kmh mph hPa Pa mbar mb MB kb KB gb GB tb '
+    "km km² km³ m m² m³ dm dm² dm³ cm cm² cm³ mm mm² mm³ ha µm nm yd in ft "
-          'TB T G M K % км км² км³ м м² м³ дм дм² дм³ см см² см³ мм мм² мм³ нм '
+    "kg g mg µg t lb oz m/s km/h kmh mph hPa Pa mbar mb MB kb KB gb GB tb "
-          'кг г мг м/с км/ч кПа Па мбар Кб КБ кб Мб МБ мб Гб ГБ гб Тб ТБ тб'
+    "TB T G M K % км км² км³ м м² м³ дм дм² дм³ см см² см³ мм мм² мм³ нм "
-          'كم كم² كم³ م م² م³ سم سم² سم³ مم مم² مم³ كم غرام جرام جم كغ ملغ كوب اكواب')
+    "кг г мг м/с км/ч кПа Па мбар Кб КБ кб Мб МБ мб Гб ГБ гб Тб ТБ тб"
-_currency = r'\$ £ € ¥ ฿ US\$ C\$ A\$ ₽ ﷼'
+    "كم كم² كم³ م م² م³ سم سم² سم³ مم مم² مم³ كم غرام جرام جم كغ ملغ كوب اكواب"
 )
 _currency = r"\$ £ € ¥ ฿ US\$ C\$ A\$ ₽ ﷼"
 # These expressions contain various unicode variations, including characters
 # used in Chinese (see #1333, #1340, #1351) – unless there are cross-language
 # conflicts, spaCy's base tokenizer should handle all of those by default
-_punct = r'… …… , : ; \! \? ¿ ؟ ¡ \( \) \[ \] \{ \} < > _ # \* & 。 ？ ！ ， 、 ； ： ～ · । ، ؛ ٪'
+_punct = (
    r"… …… , : ; \! \? ¿ ؟ ¡ \( \) \[ \] \{ \} < > _ # \* & 。 ？ ！ ， 、 ； ： ～ · । ، ؛ ٪"
 )
 _quotes = r'\' \'\' " ” “ `` ` ‘ ´ ‘‘ ’’ ‚ , „ » « 「 」 『 』 （ ） 〔 〕 【 】 《 》 〈 〉'
-_hyphens = '- – — -- --- —— ~'
+_hyphens = "- – — -- --- —— ~"
 # Various symbols like dingbats, but also emoji
 # Details: https://www.compart.com/en/unicode/category/So
-_other_symbols = r'[\p{So}]'
+_other_symbols = r"[\p{So}]"
 UNITS = merge_chars(_units)
 CURRENCY = merge_chars(_currency)
@ -60,5 +64,5 @@ LIST_CURRENCY = split_chars(_currency)
 LIST_QUOTES = split_chars(_quotes)
 LIST_PUNCT = split_chars(_punct)
 LIST_HYPHENS = split_chars(_hyphens)
-LIST_ELLIPSES = [r'\.\.+', '…']
+LIST_ELLIPSES = [r"\.\.+", "…"]
 LIST_ICONS = [_other_symbols]
--- a/spacy/lang/da/init.py
+++ b/spacy/lang/da/init.py
@ -20,9 +20,10 @@ from ...util import update_exc, add_lookups
 class DanishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
-    lex_attr_getters[LANG] = lambda text: 'da'
+    lex_attr_getters[LANG] = lambda text: "da"
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
+    lex_attr_getters[NORM] = add_lookups(
-                                         BASE_NORMS, NORM_EXCEPTIONS)
+        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    morph_rules = MORPH_RULES
    infixes = TOKENIZER_INFIXES
@ -33,8 +34,8 @@ class DanishDefaults(Language.Defaults):
 class Danish(Language):
-    lang = 'da'
+    lang = "da"
    Defaults = DanishDefaults
-__all__ = ['Danish']
+__all__ = ["Danish"]
--- a/spacy/lang/da/examples.py
+++ b/spacy/lang/da/examples.py
@ -14,5 +14,5 @@ sentences = [
    "Apple overvejer at købe et britisk startup for 1 milliard dollar",
    "Selvkørende biler flytter forsikringsansvaret over på producenterne",
    "San Francisco overvejer at forbyde udbringningsrobotter på fortov",
-    "London er en stor by i Storbritannien"
+    "London er en stor by i Storbritannien",
 ]
--- a/spacy/lang/da/lex_attrs.py
+++ b/spacy/lang/da/lex_attrs.py
@ -3,8 +3,8 @@ from __future__ import unicode_literals
 from ...attrs import LIKE_NUM
 # Source http://fjern-uv.dk/tal.php
 # Source http://fjern-uv.dk/tal.php
 _num_words = """nul
 en et to tre fire fem seks syv otte ni ti
 elleve tolv tretten fjorten femten seksten sytten atten nitten tyve
@ -19,8 +19,8 @@ enoghalvfems tooghalvfems treoghalvfems fireoghalvfems femoghalvfems seksoghalvf
 million milliard billion billiard trillion trilliard
 """.split()
 # source http://www.duda.dk/video/dansk/grammatik/talord/talord.html
 # Source: http://www.duda.dk/video/dansk/grammatik/talord/talord.html
 _ordinal_words = """nulte
 første anden tredje fjerde femte sjette syvende ottende niende tiende
 elfte tolvte trettende fjortende femtende sekstende syttende attende nittende tyvende
@ -33,14 +33,15 @@ enogfirsindstyvende toogfirsindstyvende treogfirsindstyvende fireogfirsindstyven
 enoghalvfemsindstyvende tooghalvfemsindstyvende treoghalvfemsindstyvende fireoghalvfemsindstyvende femoghalvfemsindstyvende seksoghalvfemsindstyvende syvoghalvfemsindstyvende otteoghalvfemsindstyvende nioghalvfemsindstyvende
 """.split()
 def like_num(text):
-    if text.startswith(('+', '-', '±', '~')):
+    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
-    text = text.replace(',', '').replace('.', '')
+    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
-    if text.count('/') == 1:
+    if text.count("/") == 1:
-        num, denom = text.split('/')
+        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    if text.lower() in _num_words:
@ -49,6 +50,5 @@ def like_num(text):
        return True
    return False
-LEX_ATTRS = {
+
-    LIKE_NUM: like_num
+LEX_ATTRS = {LIKE_NUM: like_num}
 }
--- a/spacy/lang/da/morph_rules.py
+++ b/spacy/lang/da/morph_rules.py
@ -11,53 +11,299 @@ from ...symbols import LEMMA, PRON_LEMMA
 MORPH_RULES = {
    "PRON": {
-        "jeg":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Case": "Nom", "Gender": "Com"},                      # Case=Nom|Gender=Com|Number=Sing|Person=1|PronType=Prs
+        "jeg": {
-        "mig":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Case": "Acc", "Gender": "Com"},                      # Case=Acc|Gender=Com|Number=Sing|Person=1|PronType=Prs
+            LEMMA: PRON_LEMMA,
-        "min":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Poss": "Yes", "Gender": "Com"},                      # Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs
+            "PronType": "Prs",
-        "mit":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Poss": "Yes", "Gender": "Neut"},                     # Gender=Neut|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs
+            "Person": "One",
-        "vor":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Poss": "Yes", "Gender": "Com"},                      # Gender=Com|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form
+            "Number": "Sing",
-        "vort":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Poss": "Yes", "Gender": "Neut"},                     # Gender=Neut|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form
+            "Case": "Nom",
-        "du":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Sing", "Case": "Nom", "Gender": "Com"},                      # Case=Nom|Gender=Com|Number=Sing|Person=2|PronType=Prs
+            "Gender": "Com",
-        "dig":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Sing", "Case": "Acc", "Gender": "Com"},                      # Case=Acc|Gender=Com|Number=Sing|Person=2|PronType=Prs
+        },  # Case=Nom|Gender=Com|Number=Sing|Person=1|PronType=Prs
-        "din":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Sing", "Poss": "Yes", "Gender": "Com"},                      # Gender=Com|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs
+        "mig": {
-        "dit":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Sing", "Poss": "Yes", "Gender": "Neut"},                     # Gender=Neut|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs
+            LEMMA: PRON_LEMMA,
-        "han":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Nom", "Gender": "Com"},                    # Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs
+            "PronType": "Prs",
-        "hun":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Nom", "Gender": "Com"},                    # Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs
+            "Person": "One",
-        "den":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Com"},                                   # Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs, See note above.
+            "Number": "Sing",
-        "det":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Neut"},                                  # Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs See note above.
+            "Case": "Acc",
-        "ham":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Acc", "Gender": "Com"},                    # Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs
+            "Gender": "Com",
-        "hende":        {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Acc", "Gender": "Com"},                    # Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs
+        },  # Case=Acc|Gender=Com|Number=Sing|Person=1|PronType=Prs
-        "sin":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Poss": "Yes", "Gender": "Com", "Reflex": "Yes"},   # Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes
+        "min": {
-        "sit":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Poss": "Yes", "Gender": "Neut", "Reflex": "Yes"},  # Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes
+            LEMMA: PRON_LEMMA,
-
+            "PronType": "Prs",
-        "vi":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Case": "Nom", "Gender": "Com"},                      # Case=Nom|Gender=Com|Number=Plur|Person=1|PronType=Prs
+            "Person": "One",
-        "os":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Case": "Acc", "Gender": "Com"},                      # Case=Acc|Gender=Com|Number=Plur|Person=1|PronType=Prs
+            "Number": "Sing",
-        "mine":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Poss": "Yes"},                                       # Number=Plur|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs
+            "Poss": "Yes",
-        "vore":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Poss": "Yes"},                                       # Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form
+            "Gender": "Com",
-        "I":            {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Plur", "Case": "Nom", "Gender": "Com"},                      # Case=Nom|Gender=Com|Number=Plur|Person=2|PronType=Prs
+        },  # Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs
-        "jer":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Plur", "Case": "Acc", "Gender": "Com"},                      # Case=Acc|Gender=Com|Number=Plur|Person=2|PronType=Prs
+        "mit": {
-        "dine":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Plur", "Poss": "Yes"},                                       # Number=Plur|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs
+            LEMMA: PRON_LEMMA,
-        "de":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Plur", "Case": "Nom"},                                     # Case=Nom|Number=Plur|Person=3|PronType=Prs
+            "PronType": "Prs",
-        "dem":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Plur", "Case": "Acc"},                                     # Case=Acc|Number=Plur|Person=3|PronType=Prs
+            "Person": "One",
-        "sine":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Plur", "Poss": "Yes", "Reflex": "Yes"},                    # Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes
+            "Number": "Sing",
-
+            "Poss": "Yes",
-        "vores":        {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Poss": "Yes"},                                                         # Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs
+            "Gender": "Neut",
-        "De":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Case": "Nom", "Gender": "Com"},                                        # Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs
+        },  # Gender=Neut|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs
-        "Dem":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Case": "Acc", "Gender": "Com"},                                        # Case=Acc|Gender=Com|Person=2|Polite=Form|PronType=Prs
+        "vor": {
-        "Deres":        {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Poss": "Yes"},                                                         # Person=2|Polite=Form|Poss=Yes|PronType=Prs
+            LEMMA: PRON_LEMMA,
-        "jeres":        {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Poss": "Yes"},                                                         # Number[psor]=Plur|Person=2|Poss=Yes|PronType=Prs
+            "PronType": "Prs",
-        "sig":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Case": "Acc", "Reflex": "Yes"},                                      # Case=Acc|Person=3|PronType=Prs|Reflex=Yes
+            "Person": "One",
-        "hans":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Poss": "Yes"},                                                       # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
+            "Number": "Sing",
-        "hendes":       {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Poss": "Yes"},                                                       # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
+            "Poss": "Yes",
-        "dens":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Poss": "Yes"},                                                       # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
+            "Gender": "Com",
-        "dets":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Poss": "Yes"},                                                       # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
+        },  # Gender=Com|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form
-        "deres":        {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Poss": "Yes"},                                                       # Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs
+        "vort": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Sing",
            "Poss": "Yes",
            "Gender": "Neut",
        },  # Gender=Neut|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form
        "du": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Sing",
            "Case": "Nom",
            "Gender": "Com",
        },  # Case=Nom|Gender=Com|Number=Sing|Person=2|PronType=Prs
        "dig": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Sing",
            "Case": "Acc",
            "Gender": "Com",
        },  # Case=Acc|Gender=Com|Number=Sing|Person=2|PronType=Prs
        "din": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Sing",
            "Poss": "Yes",
            "Gender": "Com",
        },  # Gender=Com|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs
        "dit": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Sing",
            "Poss": "Yes",
            "Gender": "Neut",
        },  # Gender=Neut|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs
        "han": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Nom",
            "Gender": "Com",
        },  # Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs
        "hun": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Nom",
            "Gender": "Com",
        },  # Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs
        "den": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Com",
        },  # Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs, See note above.
        "det": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Neut",
        },  # Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs See note above.
        "ham": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Acc",
            "Gender": "Com",
        },  # Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs
        "hende": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Acc",
            "Gender": "Com",
        },  # Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs
        "sin": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Poss": "Yes",
            "Gender": "Com",
            "Reflex": "Yes",
        },  # Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes
        "sit": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Poss": "Yes",
            "Gender": "Neut",
            "Reflex": "Yes",
        },  # Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes
        "vi": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Case": "Nom",
            "Gender": "Com",
        },  # Case=Nom|Gender=Com|Number=Plur|Person=1|PronType=Prs
        "os": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Case": "Acc",
            "Gender": "Com",
        },  # Case=Acc|Gender=Com|Number=Plur|Person=1|PronType=Prs
        "mine": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Poss": "Yes",
        },  # Number=Plur|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs
        "vore": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Poss": "Yes",
        },  # Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form
        "I": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Plur",
            "Case": "Nom",
            "Gender": "Com",
        },  # Case=Nom|Gender=Com|Number=Plur|Person=2|PronType=Prs
        "jer": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Plur",
            "Case": "Acc",
            "Gender": "Com",
        },  # Case=Acc|Gender=Com|Number=Plur|Person=2|PronType=Prs
        "dine": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Plur",
            "Poss": "Yes",
        },  # Number=Plur|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs
        "de": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Plur",
            "Case": "Nom",
        },  # Case=Nom|Number=Plur|Person=3|PronType=Prs
        "dem": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Plur",
            "Case": "Acc",
        },  # Case=Acc|Number=Plur|Person=3|PronType=Prs
        "sine": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Plur",
            "Poss": "Yes",
            "Reflex": "Yes",
        },  # Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes
        "vores": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Poss": "Yes",
        },  # Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs
        "De": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Case": "Nom",
            "Gender": "Com",
        },  # Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs
        "Dem": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Case": "Acc",
            "Gender": "Com",
        },  # Case=Acc|Gender=Com|Person=2|Polite=Form|PronType=Prs
        "Deres": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Poss": "Yes",
        },  # Person=2|Polite=Form|Poss=Yes|PronType=Prs
        "jeres": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Poss": "Yes",
        },  # Number[psor]=Plur|Person=2|Poss=Yes|PronType=Prs
        "sig": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Case": "Acc",
            "Reflex": "Yes",
        },  # Case=Acc|Person=3|PronType=Prs|Reflex=Yes
        "hans": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Poss": "Yes",
        },  # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
        "hendes": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Poss": "Yes",
        },  # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
        "dens": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Poss": "Yes",
        },  # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
        "dets": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Poss": "Yes",
        },  # Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs
        "deres": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Poss": "Yes",
        },  # Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs
    },
    "VERB": {
        "er": {LEMMA: "være", "VerbForm": "Fin", "Tense": "Pres"},
-        "var":          {LEMMA: "være", "VerbForm": "Fin", "Tense": "Past"}
+        "var": {LEMMA: "være", "VerbForm": "Fin", "Tense": "Past"},
-    }
+    },
 }
 for tag, rules in MORPH_RULES.items():
--- a/spacy/lang/da/norm_exceptions.py
+++ b/spacy/lang/da/norm_exceptions.py
@ -516,7 +516,7 @@ _exc = {
    "øjeåbner": "øjenåbner",  # 1
    "økonomiministerium": "økonomiministerie",  # 1
    "ørenring": "ørering",  # 2
-    "øvehefte": "øvehæfte"  # 1
+    "øvehefte": "øvehæfte",  # 1
 }
--- a/spacy/lang/da/punctuation.py
+++ b/spacy/lang/da/punctuation.py
@ -6,17 +6,26 @@ from ..char_classes import QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
 from ..punctuation import TOKENIZER_SUFFIXES
-_quotes = QUOTES.replace("'", '')
+_quotes = QUOTES.replace("'", "")
-_infixes = (LIST_ELLIPSES + LIST_ICONS +
+_infixes = (
-            [r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
+    LIST_ELLIPSES
-             r'(?<=[{a}])[,!?](?=[{a}])'.format(a=ALPHA),
+    + LIST_ICONS
    + [
        r"(?<=[{}])\.(?=[{}])".format(ALPHA_LOWER, ALPHA_UPPER),
        r"(?<=[{a}])[,!?](?=[{a}])".format(a=ALPHA),
        r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA),
-             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
+        r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
-             r'(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])'.format(a=ALPHA, q=_quotes),
+        r"(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])".format(a=ALPHA, q=_quotes),
-             r'(?<=[{a}])--(?=[{a}])'.format(a=ALPHA)])
+        r"(?<=[{a}])--(?=[{a}])".format(a=ALPHA),
    ]
 )
-_suffixes = [suffix for suffix in TOKENIZER_SUFFIXES if suffix not in ["'s", "'S", "’s", "’S", r"\'"]]
+_suffixes = [
    suffix
    for suffix in TOKENIZER_SUFFIXES
    if suffix not in ["'s", "'S", "’s", "’S", r"\'"]
 ]
 _suffixes += [r"(?<=[^sSxXzZ])\'"]
--- a/spacy/lang/da/stop_words.py
+++ b/spacy/lang/da/stop_words.py
@ -3,7 +3,8 @@ from __future__ import unicode_literals
 # Source: Handpicked by Jens Dahl Møllerhøj.
-STOP_WORDS = set("""
+STOP_WORDS = set(
    """
 af aldrig alene alle allerede alligevel alt altid anden andet andre at
 bag begge blandt blev blive bliver burde bør
@ -43,4 +44,5 @@ ud uden udover under undtagen
 var ved vi via vil ville vore vores vær være været
 øvrigt
-""".split())
+""".split()
 )
--- a/spacy/lang/da/tokenizer_exceptions.py
+++ b/spacy/lang/da/tokenizer_exceptions.py
@ -51,73 +51,482 @@ for exc_data in [
    {ORTH: "Tirs.", LEMMA: "tirsdag"},
    {ORTH: "Ons.", LEMMA: "onsdag"},
    {ORTH: "Fre.", LEMMA: "fredag"},
-        {ORTH: "Lør.", LEMMA: "lørdag"}]:
+    {ORTH: "Lør.", LEMMA: "lørdag"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
 # Specified case only
 for orth in [
-        "diam.", "ib.", "mia.", "mik.", "pers.", "A.D.", "A/S", "B.C.", "BK.",
+    "diam.",
-        "Dr.", "Boul.", "Chr.", "Dronn.", "H.K.H.", "H.M.", "Hf.", "i/s", "I/S",
+    "ib.",
-        "Kprs.", "L.A.", "Ll.", "m/s", "M/S", "Mag.", "Mr.", "Ndr.", "Ph.d.",
+    "mia.",
-        "Prs.", "Rcp.", "Sdr.", "Skt.", "Spl.", "Vg."]:
+    "mik.",
    "pers.",
    "A.D.",
    "A/S",
    "B.C.",
    "BK.",
    "Dr.",
    "Boul.",
    "Chr.",
    "Dronn.",
    "H.K.H.",
    "H.M.",
    "Hf.",
    "i/s",
    "I/S",
    "Kprs.",
    "L.A.",
    "Ll.",
    "m/s",
    "M/S",
    "Mag.",
    "Mr.",
    "Ndr.",
    "Ph.d.",
    "Prs.",
    "Rcp.",
    "Sdr.",
    "Skt.",
    "Spl.",
    "Vg.",
 ]:
    _exc[orth] = [{ORTH: orth}]
 for orth in [
-        "aarh.", "ac.", "adj.", "adr.", "adsk.", "adv.", "afb.", "afd.", "afg.",
+    "aarh.",
-        "afk.", "afs.", "aht.", "alg.", "alk.", "alm.", "amer.", "ang.", "ank.",
+    "ac.",
-        "anl.", "anv.", "arb.", "arr.", "att.", "bd.", "bdt.", "beg.", "begr.",
+    "adj.",
-        "beh.", "bet.", "bev.", "bhk.", "bib.", "bibl.", "bidr.", "bildl.",
+    "adr.",
-        "bill.", "biol.", "bk.", "bl.", "bl.a.", "borgm.", "br.", "brolægn.",
+    "adsk.",
-        "bto.", "bygn.", "ca.", "cand.", "d.d.", "d.m.", "d.s.", "d.s.s.",
+    "adv.",
-        "d.y.", "d.å.", "d.æ.", "dagl.", "dat.", "dav.", "def.", "dek.", "dep.",
+    "afb.",
-        "desl.", "dir.", "disp.", "distr.", "div.", "dkr.", "dl.", "do.",
+    "afd.",
-        "dobb.", "dr.h.c", "dr.phil.", "ds.", "dvs.", "e.b.", "e.l.", "e.o.",
+    "afg.",
-        "e.v.t.", "eftf.", "eftm.", "egl.", "eks.", "eksam.", "ekskl.", "eksp.",
+    "afk.",
-        "ekspl.", "el.lign.", "emer.", "endv.", "eng.", "enk.", "etc.", "etym.",
+    "afs.",
-        "eur.", "evt.", "exam.", "f.eks.", "f.m.", "f.n.", "f.o.", "f.o.m.",
+    "aht.",
-        "f.s.v.", "f.t.", "f.v.t.", "f.å.", "fa.", "fakt.", "fam.", "ff.",
+    "alg.",
-        "fg.", "fhv.", "fig.", "filol.", "filos.", "fl.", "flg.", "fm.", "fmd.",
+    "alk.",
-        "fol.", "forb.", "foreg.", "foren.", "forf.", "fork.", "forr.", "fors.",
+    "alm.",
-        "forsk.", "forts.", "fr.", "fr.u.", "frk.", "fsva.", "fuldm.", "fung.",
+    "amer.",
-        "fx.", "fys.", "fær.", "g.d.", "g.m.", "gd.", "gdr.", "genuds.", "gl.",
+    "ang.",
-        "gn.", "gns.", "gr.", "grdl.", "gross.", "h.a.", "h.c.", "hdl.",
+    "ank.",
-        "henv.", "hhv.", "hj.hj.", "hj.spl.", "hort.", "hosp.", "hpl.", "hr.",
+    "anl.",
-        "hrs.", "hum.", "hvp.", "i.e.", "id.", "if.", "iflg.", "ifm.", "ift.",
+    "anv.",
-        "iht.", "ill.", "indb.", "indreg.", "inf.", "ing.", "inh.", "inj.",
+    "arb.",
-        "inkl.", "insp.", "instr.", "isl.", "istf.", "it.", "ital.", "iv.",
+    "arr.",
-        "jap.", "jf.", "jfr.", "jnr.", "j.nr.", "jr.", "jur.", "jvf.", "kap.",
+    "att.",
-        "kbh.", "kem.", "kgl.", "kl.", "kld.", "knsp.", "komm.", "kons.",
+    "bd.",
-        "korr.", "kp.", "kr.", "kst.", "kt.", "ktr.", "kv.", "kvt.", "l.c.",
+    "bdt.",
-        "lab.", "lat.", "lb.m.", "lb.nr.", "lejl.", "lgd.", "lic.", "lign.",
+    "beg.",
-        "lin.", "ling.merc.", "litt.", "loc.cit.", "lok.", "lrs.", "ltr.",
+    "begr.",
-        "m.a.o.", "m.fl.", "m.m.", "m.v.", "m.v.h.", "maks.", "md.", "mdr.",
+    "beh.",
-        "mdtl.", "mezz.", "mfl.", "m.h.p.", "m.h.t.", "mht.", "mill.", "mio.",
+    "bet.",
-        "modt.", "mrk.", "mul.", "mv.", "n.br.", "n.f.", "nb.", "nedenst.",
+    "bev.",
-        "nl.", "nr.", "nto.", "nuv.", "o/m", "o.a.", "o.fl.", "o.h.", "o.l.",
+    "bhk.",
-        "o.lign.", "o.m.a.", "o.s.fr.", "obl.", "obs.", "odont.", "oecon.",
+    "bib.",
-        "off.", "ofl.", "omg.", "omkr.", "omr.", "omtr.", "opg.", "opl.",
+    "bibl.",
-        "opr.", "org.", "orig.", "osv.", "ovenst.", "overs.", "ovf.", "p.a.",
+    "bidr.",
-        "p.b.a", "p.b.v", "p.c.", "p.m.", "p.m.v.", "p.n.", "p.p.", "p.p.s.",
+    "bildl.",
-        "p.s.", "p.t.", "p.v.a.", "p.v.c.", "pag.", "pass.", "pcs.", "pct.",
+    "bill.",
-        "pd.", "pens.", "pft.", "pg.", "pga.", "pgl.", "pinx.", "pk.", "pkt.",
+    "biol.",
-        "polit.", "polyt.", "pos.", "pp.", "ppm.", "pr.", "prc.", "priv.",
+    "bk.",
-        "prod.", "prof.", "pron.", "præd.", "præf.", "præt.", "psych.", "pt.",
+    "bl.",
-        "pæd.", "q.e.d.", "rad.", "red.", "ref.", "reg.", "regn.", "rel.",
+    "bl.a.",
-        "rep.", "repr.", "resp.", "rest.", "rm.", "rtg.", "russ.", "s.br.",
+    "borgm.",
-        "s.d.", "s.f.", "s.m.b.a.", "s.u.", "s.å.", "sa.", "sb.", "sc.",
+    "br.",
-        "scient.", "scil.", "sek.", "sekr.", "self.", "sem.", "shj.", "sign.",
+    "brolægn.",
-        "sing.", "sj.", "skr.", "slutn.", "sml.", "smp.", "snr.", "soc.",
+    "bto.",
-        "soc.dem.", "sp.", "spec.", "spm.", "spr.", "spsk.", "statsaut.", "st.",
+    "bygn.",
-        "stk.", "str.", "stud.", "subj.", "subst.", "suff.", "sup.", "suppl.",
+    "ca.",
-        "sv.", "såk.", "sædv.", "t/r", "t.h.", "t.o.", "t.o.m.", "t.v.", "tbl.",
+    "cand.",
-        "tcp/ip", "td.", "tdl.", "tdr.", "techn.", "tekn.", "temp.", "th.",
+    "d.d.",
-        "theol.", "tidl.", "tilf.", "tilh.", "till.", "tilsv.", "tjg.", "tkr.",
+    "d.m.",
-        "tlf.", "tlgr.", "tr.", "trp.", "tsk.", "tv.", "ty.", "u/b", "udb.",
+    "d.s.",
-        "udbet.", "ugtl.", "undt.", "v.f.", "vb.", "vedk.", "vedl.", "vedr.",
+    "d.s.s.",
-        "vejl.", "vh.", "vha.", "vs.", "vsa.", "vær.", "zool.", "ø.lgd.",
+    "d.y.",
-        "øvr.", "årg.", "årh."]:
+    "d.å.",
    "d.æ.",
    "dagl.",
    "dat.",
    "dav.",
    "def.",
    "dek.",
    "dep.",
    "desl.",
    "dir.",
    "disp.",
    "distr.",
    "div.",
    "dkr.",
    "dl.",
    "do.",
    "dobb.",
    "dr.h.c",
    "dr.phil.",
    "ds.",
    "dvs.",
    "e.b.",
    "e.l.",
    "e.o.",
    "e.v.t.",
    "eftf.",
    "eftm.",
    "egl.",
    "eks.",
    "eksam.",
    "ekskl.",
    "eksp.",
    "ekspl.",
    "el.lign.",
    "emer.",
    "endv.",
    "eng.",
    "enk.",
    "etc.",
    "etym.",
    "eur.",
    "evt.",
    "exam.",
    "f.eks.",
    "f.m.",
    "f.n.",
    "f.o.",
    "f.o.m.",
    "f.s.v.",
    "f.t.",
    "f.v.t.",
    "f.å.",
    "fa.",
    "fakt.",
    "fam.",
    "ff.",
    "fg.",
    "fhv.",
    "fig.",
    "filol.",
    "filos.",
    "fl.",
    "flg.",
    "fm.",
    "fmd.",
    "fol.",
    "forb.",
    "foreg.",
    "foren.",
    "forf.",
    "fork.",
    "forr.",
    "fors.",
    "forsk.",
    "forts.",
    "fr.",
    "fr.u.",
    "frk.",
    "fsva.",
    "fuldm.",
    "fung.",
    "fx.",
    "fys.",
    "fær.",
    "g.d.",
    "g.m.",
    "gd.",
    "gdr.",
    "genuds.",
    "gl.",
    "gn.",
    "gns.",
    "gr.",
    "grdl.",
    "gross.",
    "h.a.",
    "h.c.",
    "hdl.",
    "henv.",
    "hhv.",
    "hj.hj.",
    "hj.spl.",
    "hort.",
    "hosp.",
    "hpl.",
    "hr.",
    "hrs.",
    "hum.",
    "hvp.",
    "i.e.",
    "id.",
    "if.",
    "iflg.",
    "ifm.",
    "ift.",
    "iht.",
    "ill.",
    "indb.",
    "indreg.",
    "inf.",
    "ing.",
    "inh.",
    "inj.",
    "inkl.",
    "insp.",
    "instr.",
    "isl.",
    "istf.",
    "it.",
    "ital.",
    "iv.",
    "jap.",
    "jf.",
    "jfr.",
    "jnr.",
    "j.nr.",
    "jr.",
    "jur.",
    "jvf.",
    "kap.",
    "kbh.",
    "kem.",
    "kgl.",
    "kl.",
    "kld.",
    "knsp.",
    "komm.",
    "kons.",
    "korr.",
    "kp.",
    "kr.",
    "kst.",
    "kt.",
    "ktr.",
    "kv.",
    "kvt.",
    "l.c.",
    "lab.",
    "lat.",
    "lb.m.",
    "lb.nr.",
    "lejl.",
    "lgd.",
    "lic.",
    "lign.",
    "lin.",
    "ling.merc.",
    "litt.",
    "loc.cit.",
    "lok.",
    "lrs.",
    "ltr.",
    "m.a.o.",
    "m.fl.",
    "m.m.",
    "m.v.",
    "m.v.h.",
    "maks.",
    "md.",
    "mdr.",
    "mdtl.",
    "mezz.",
    "mfl.",
    "m.h.p.",
    "m.h.t.",
    "mht.",
    "mill.",
    "mio.",
    "modt.",
    "mrk.",
    "mul.",
    "mv.",
    "n.br.",
    "n.f.",
    "nb.",
    "nedenst.",
    "nl.",
    "nr.",
    "nto.",
    "nuv.",
    "o/m",
    "o.a.",
    "o.fl.",
    "o.h.",
    "o.l.",
    "o.lign.",
    "o.m.a.",
    "o.s.fr.",
    "obl.",
    "obs.",
    "odont.",
    "oecon.",
    "off.",
    "ofl.",
    "omg.",
    "omkr.",
    "omr.",
    "omtr.",
    "opg.",
    "opl.",
    "opr.",
    "org.",
    "orig.",
    "osv.",
    "ovenst.",
    "overs.",
    "ovf.",
    "p.a.",
    "p.b.a",
    "p.b.v",
    "p.c.",
    "p.m.",
    "p.m.v.",
    "p.n.",
    "p.p.",
    "p.p.s.",
    "p.s.",
    "p.t.",
    "p.v.a.",
    "p.v.c.",
    "pag.",
    "pass.",
    "pcs.",
    "pct.",
    "pd.",
    "pens.",
    "pft.",
    "pg.",
    "pga.",
    "pgl.",
    "pinx.",
    "pk.",
    "pkt.",
    "polit.",
    "polyt.",
    "pos.",
    "pp.",
    "ppm.",
    "pr.",
    "prc.",
    "priv.",
    "prod.",
    "prof.",
    "pron.",
    "præd.",
    "præf.",
    "præt.",
    "psych.",
    "pt.",
    "pæd.",
    "q.e.d.",
    "rad.",
    "red.",
    "ref.",
    "reg.",
    "regn.",
    "rel.",
    "rep.",
    "repr.",
    "resp.",
    "rest.",
    "rm.",
    "rtg.",
    "russ.",
    "s.br.",
    "s.d.",
    "s.f.",
    "s.m.b.a.",
    "s.u.",
    "s.å.",
    "sa.",
    "sb.",
    "sc.",
    "scient.",
    "scil.",
    "sek.",
    "sekr.",
    "self.",
    "sem.",
    "shj.",
    "sign.",
    "sing.",
    "sj.",
    "skr.",
    "slutn.",
    "sml.",
    "smp.",
    "snr.",
    "soc.",
    "soc.dem.",
    "sp.",
    "spec.",
    "spm.",
    "spr.",
    "spsk.",
    "statsaut.",
    "st.",
    "stk.",
    "str.",
    "stud.",
    "subj.",
    "subst.",
    "suff.",
    "sup.",
    "suppl.",
    "sv.",
    "såk.",
    "sædv.",
    "t/r",
    "t.h.",
    "t.o.",
    "t.o.m.",
    "t.v.",
    "tbl.",
    "tcp/ip",
    "td.",
    "tdl.",
    "tdr.",
    "techn.",
    "tekn.",
    "temp.",
    "th.",
    "theol.",
    "tidl.",
    "tilf.",
    "tilh.",
    "till.",
    "tilsv.",
    "tjg.",
    "tkr.",
    "tlf.",
    "tlgr.",
    "tr.",
    "trp.",
    "tsk.",
    "tv.",
    "ty.",
    "u/b",
    "udb.",
    "udbet.",
    "ugtl.",
    "undt.",
    "v.f.",
    "vb.",
    "vedk.",
    "vedl.",
    "vedr.",
    "vejl.",
    "vh.",
    "vha.",
    "vs.",
    "vsa.",
    "vær.",
    "zool.",
    "ø.lgd.",
    "øvr.",
    "årg.",
    "årh.",
 ]:
    _exc[orth] = [{ORTH: orth}]
    capitalized = orth.capitalize()
    _exc[capitalized] = [{ORTH: capitalized}]
@ -138,7 +547,8 @@ for exc_data in [
    {ORTH: "ha'", LEMMA: "have", NORM: "have"},
    {ORTH: "Ha'", LEMMA: "have", NORM: "have"},
    {ORTH: "ik'", LEMMA: "ikke", NORM: "ikke"},
-        {ORTH: "Ik'", LEMMA: "ikke", NORM: "ikke"}]:
+    {ORTH: "Ik'", LEMMA: "ikke", NORM: "ikke"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
@ -147,11 +557,7 @@ for h in range(1, 31 + 1):
    for period in ["."]:
        _exc["%d%s" % (h, period)] = [{ORTH: "%d." % h}]
-_custom_base_exc = {
+_custom_base_exc = {"i.": [{ORTH: "i", LEMMA: "i", NORM: "i"}, {ORTH: ".", TAG: PUNCT}]}
    "i.": [
        {ORTH: "i", LEMMA: "i", NORM: "i"},
        {ORTH: ".", TAG: PUNCT}]
 }
 _exc.update(_custom_base_exc)
 TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/de/init.py
+++ b/spacy/lang/de/init.py
@ -18,9 +18,10 @@ from ...util import update_exc, add_lookups
 class GermanDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: 'de'
+    lex_attr_getters[LANG] = lambda text: "de"
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
+    lex_attr_getters[NORM] = add_lookups(
-                                         NORM_EXCEPTIONS, BASE_NORMS)
+        Language.Defaults.lex_attr_getters[NORM], NORM_EXCEPTIONS, BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    infixes = TOKENIZER_INFIXES
    tag_map = TAG_MAP
@ -30,8 +31,8 @@ class GermanDefaults(Language.Defaults):
 class German(Language):
-    lang = 'de'
+    lang = "de"
    Defaults = GermanDefaults
-__all__ = ['German']
+__all__ = ["German"]
--- a/spacy/lang/de/examples.py
+++ b/spacy/lang/de/examples.py
@ -18,5 +18,5 @@ sentences = [
    "San Francisco erwägt Verbot von Lieferrobotern",
    "Autonome Fahrzeuge verlagern Haftpflicht auf Hersteller",
    "Wo bist du?",
-    "Was ist die Hauptstadt von Deutschland?"
+    "Was ist die Hauptstadt von Deutschland?",
 ]
--- a/spacy/lang/de/norm_exceptions.py
+++ b/spacy/lang/de/norm_exceptions.py
@ -6,9 +6,7 @@ from __future__ import unicode_literals
 # old vs. new spelling rules, and all possible cases.
-_exc = {
+_exc = {"daß": "dass"}
    "daß": "dass"
 }
 NORM_EXCEPTIONS = {}
--- a/spacy/lang/de/punctuation.py
+++ b/spacy/lang/de/punctuation.py
@ -5,16 +5,21 @@ from ..char_classes import LIST_ELLIPSES, LIST_ICONS
 from ..char_classes import QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
-_quotes = QUOTES.replace("'", '')
+_quotes = QUOTES.replace("'", "")
-_infixes = (LIST_ELLIPSES + LIST_ICONS +
+_infixes = (
-            [r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
+    LIST_ELLIPSES
-             r'(?<=[{a}])[,!?](?=[{a}])'.format(a=ALPHA),
+    + LIST_ICONS
    + [
        r"(?<=[{}])\.(?=[{}])".format(ALPHA_LOWER, ALPHA_UPPER),
        r"(?<=[{a}])[,!?](?=[{a}])".format(a=ALPHA),
        r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA),
-             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
+        r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
-             r'(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])'.format(a=ALPHA, q=_quotes),
+        r"(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])".format(a=ALPHA, q=_quotes),
-             r'(?<=[{a}])--(?=[{a}])'.format(a=ALPHA),
+        r"(?<=[{a}])--(?=[{a}])".format(a=ALPHA),
-             r'(?<=[0-9])-(?=[0-9])'])
+        r"(?<=[0-9])-(?=[0-9])",
    ]
 )
 TOKENIZER_INFIXES = _infixes
--- a/spacy/lang/de/stop_words.py
+++ b/spacy/lang/de/stop_words.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-STOP_WORDS = set("""
+STOP_WORDS = set(
    """
 á a ab aber ach acht achte achten achter achtes ag alle allein allem allen
 aller allerdings alles allgemeinen als also am an andere anderen andern anders
 auch auf aus ausser außer ausserdem außerdem
@ -78,4 +79,5 @@ wollt wollte wollten worden wurde würde wurden würden
 zehn zehnte zehnten zehnter zehntes zeit zu zuerst zugleich zum zunächst zur
 zurück zusammen zwanzig zwar zwei zweite zweiten zweiter zweites zwischen
-""".split())
+""".split()
 )
--- a/spacy/lang/de/syntax_iterators.py
+++ b/spacy/lang/de/syntax_iterators.py
@ -13,11 +13,24 @@ def noun_chunks(obj):
    # measurement construction, the span is sometimes extended to the right of
    # the NOUN. Example: "eine Tasse Tee" (a cup (of) tea) returns "eine Tasse Tee"
    # and not just "eine Tasse", same for "das Thema Familie".
-    labels = ['sb', 'oa', 'da', 'nk', 'mo', 'ag', 'ROOT', 'root', 'cj', 'pd', 'og', 'app']
+    labels = [
        "sb",
        "oa",
        "da",
        "nk",
        "mo",
        "ag",
        "ROOT",
        "root",
        "cj",
        "pd",
        "og",
        "app",
    ]
    doc = obj.doc  # Ensure works on both Doc and Span.
-    np_label = doc.vocab.strings.add('NP')
+    np_label = doc.vocab.strings.add("NP")
    np_deps = set(doc.vocab.strings.add(label) for label in labels)
-    close_app = doc.vocab.strings.add('nk')
+    close_app = doc.vocab.strings.add("nk")
    rbracket = 0
    for i, word in enumerate(obj):
@ -33,6 +46,4 @@ def noun_chunks(obj):
            yield word.left_edge.i, rbracket, np_label
-SYNTAX_ITERATORS = {
+SYNTAX_ITERATORS = {"noun_chunks": noun_chunks}
    'noun_chunks': noun_chunks
 }
--- a/spacy/lang/de/tag_map.py
+++ b/spacy/lang/de/tag_map.py
@ -62,5 +62,5 @@ TAG_MAP = {
    "VVIZU": {POS: VERB, "VerbForm": "inf"},
    "VVPP": {POS: VERB, "Aspect": "perf", "VerbForm": "part"},
    "XY": {POS: X},
-    "_SP":      {POS: SPACE}
+    "_SP": {POS: SPACE},
 }
--- a/spacy/lang/de/tokenizer_exceptions.py
+++ b/spacy/lang/de/tokenizer_exceptions.py
@ -5,49 +5,41 @@ from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
 _exc = {
-    "auf'm": [
+    "auf'm": [{ORTH: "auf", LEMMA: "auf"}, {ORTH: "'m", LEMMA: "der", NORM: "dem"}],
        {ORTH: "auf", LEMMA: "auf"},
        {ORTH: "'m", LEMMA: "der", NORM: "dem"}],
    "du's": [
        {ORTH: "du", LEMMA: PRON_LEMMA, TAG: "PPER"},
-        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"}],
+        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"},
-
+    ],
    "er's": [
        {ORTH: "er", LEMMA: PRON_LEMMA, TAG: "PPER"},
-        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"}],
+        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"},
-
+    ],
    "hinter'm": [
        {ORTH: "hinter", LEMMA: "hinter"},
-        {ORTH: "'m", LEMMA: "der", NORM: "dem"}],
+        {ORTH: "'m", LEMMA: "der", NORM: "dem"},
-
+    ],
    "ich's": [
        {ORTH: "ich", LEMMA: PRON_LEMMA, TAG: "PPER"},
-        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"}],
+        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"},
-
+    ],
    "ihr's": [
        {ORTH: "ihr", LEMMA: PRON_LEMMA, TAG: "PPER"},
-        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"}],
+        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"},
-
+    ],
    "sie's": [
        {ORTH: "sie", LEMMA: PRON_LEMMA, TAG: "PPER"},
-        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"}],
+        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"},
-
+    ],
    "unter'm": [
        {ORTH: "unter", LEMMA: "unter"},
-        {ORTH: "'m", LEMMA: "der", NORM: "dem"}],
+        {ORTH: "'m", LEMMA: "der", NORM: "dem"},
-
+    ],
-    "vor'm": [
+    "vor'm": [{ORTH: "vor", LEMMA: "vor"}, {ORTH: "'m", LEMMA: "der", NORM: "dem"}],
        {ORTH: "vor", LEMMA: "vor"},
        {ORTH: "'m", LEMMA: "der", NORM: "dem"}],
    "wir's": [
        {ORTH: "wir", LEMMA: PRON_LEMMA, TAG: "PPER"},
-        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"}],
+        {ORTH: "'s", LEMMA: PRON_LEMMA, TAG: "PPER", NORM: "es"},
-
+    ],
-    "über'm": [
+    "über'm": [{ORTH: "über", LEMMA: "über"}, {ORTH: "'m", LEMMA: "der", NORM: "dem"}],
        {ORTH: "über", LEMMA: "über"},
        {ORTH: "'m", LEMMA: "der", NORM: "dem"}]
 }
@ -162,21 +154,95 @@ for exc_data in [
    {ORTH: "z.Zt.", LEMMA: "zur Zeit"},
    {ORTH: "z.b.", LEMMA: "zum Beispiel"},
    {ORTH: "zzgl.", LEMMA: "zuzüglich"},
-    {ORTH: "österr.", LEMMA: "österreichisch", NORM: "österreichisch"}]:
+    {ORTH: "österr.", LEMMA: "österreichisch", NORM: "österreichisch"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
 for orth in [
-    "A.C.", "a.D.", "A.D.", "A.G.", "a.M.", "a.Z.", "Abs.", "adv.", "al.",
+    "A.C.",
-    "B.A.", "B.Sc.", "betr.", "biol.", "Biol.", "ca.", "Chr.", "Cie.", "co.",
+    "a.D.",
-    "Co.", "D.C.", "Dipl.-Ing.", "Dipl.", "Dr.", "e.g.", "e.V.", "ehem.",
+    "A.D.",
-    "entspr.", "erm.", "etc.", "ev.", "G.m.b.H.", "geb.", "Gebr.", "gem.",
+    "A.G.",
-    "h.c.", "Hg.", "hrsg.", "Hrsg.", "i.A.", "i.e.", "i.G.", "i.Tr.", "i.V.",
+    "a.M.",
-    "Ing.", "jr.", "Jr.", "jun.", "jur.", "K.O.", "L.A.", "lat.", "M.A.",
+    "a.Z.",
-    "m.E.", "m.M.", "M.Sc.", "Mr.", "N.Y.", "N.Y.C.", "nat.", "o.a.",
+    "Abs.",
-    "o.ä.", "o.g.", "o.k.", "O.K.", "p.a.", "p.s.", "P.S.", "pers.", "phil.",
+    "adv.",
-    "q.e.d.", "R.I.P.", "rer.", "sen.", "St.", "std.", "u.a.", "U.S.", "U.S.A.",
+    "al.",
-    "U.S.S.", "Vol.", "vs.", "wiss."]:
+    "B.A.",
    "B.Sc.",
    "betr.",
    "biol.",
    "Biol.",
    "ca.",
    "Chr.",
    "Cie.",
    "co.",
    "Co.",
    "D.C.",
    "Dipl.-Ing.",
    "Dipl.",
    "Dr.",
    "e.g.",
    "e.V.",
    "ehem.",
    "entspr.",
    "erm.",
    "etc.",
    "ev.",
    "G.m.b.H.",
    "geb.",
    "Gebr.",
    "gem.",
    "h.c.",
    "Hg.",
    "hrsg.",
    "Hrsg.",
    "i.A.",
    "i.e.",
    "i.G.",
    "i.Tr.",
    "i.V.",
    "Ing.",
    "jr.",
    "Jr.",
    "jun.",
    "jur.",
    "K.O.",
    "L.A.",
    "lat.",
    "M.A.",
    "m.E.",
    "m.M.",
    "M.Sc.",
    "Mr.",
    "N.Y.",
    "N.Y.C.",
    "nat.",
    "o.a.",
    "o.ä.",
    "o.g.",
    "o.k.",
    "O.K.",
    "p.a.",
    "p.s.",
    "P.S.",
    "pers.",
    "phil.",
    "q.e.d.",
    "R.I.P.",
    "rer.",
    "sen.",
    "St.",
    "std.",
    "u.a.",
    "U.S.",
    "U.S.A.",
    "U.S.S.",
    "Vol.",
    "vs.",
    "wiss.",
 ]:
    _exc[orth] = [{ORTH: orth}]
--- a/spacy/lang/el/init.py
+++ b/spacy/lang/el/init.py
@ -21,9 +21,10 @@ from ...util import update_exc, add_lookups
 class GreekDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
-    lex_attr_getters[LANG] = lambda text: 'el'  # ISO code
+    lex_attr_getters[LANG] = lambda text: "el"  # ISO code
    lex_attr_getters[NORM] = add_lookups(
-        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS)
+        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
@ -37,15 +38,16 @@ class GreekDefaults(Language.Defaults):
        lemma_rules = LEMMA_RULES
        lemma_index = LEMMA_INDEX
        lemma_exc = LEMMA_EXC
-        return GreekLemmatizer(index=lemma_index, exceptions=lemma_exc,
+        return GreekLemmatizer(
-                               rules=lemma_rules)
+            index=lemma_index, exceptions=lemma_exc, rules=lemma_rules
        )
 class Greek(Language):
-    lang = 'el'  # ISO code
+    lang = "el"  # ISO code
    Defaults = GreekDefaults  # set Defaults to custom language defaults
 # set default export – this allows the language class to be lazy-loaded
-__all__ = ['Greek']
+__all__ = ["Greek"]
--- a/spacy/lang/el/examples.py
+++ b/spacy/lang/el/examples.py
@ -9,20 +9,20 @@ Example sentences to test spaCy and its language models.
 """
 sentences = [
-    '''Η άνιση κατανομή του πλούτου και του εισοδήματος, η οποία έχει λάβει
+    """Η άνιση κατανομή του πλούτου και του εισοδήματος, η οποία έχει λάβει
-    τρομερές διαστάσεις, δεν δείχνει τάσεις βελτίωσης.''',
+    τρομερές διαστάσεις, δεν δείχνει τάσεις βελτίωσης.""",
-    '''Ο στόχος της σύντομης αυτής έκθεσης είναι να συνοψίσει τα κυριότερα
+    """Ο στόχος της σύντομης αυτής έκθεσης είναι να συνοψίσει τα κυριότερα
-    συμπεράσματα των επισκοπήσεων κάθε μιας χώρας.''',
+    συμπεράσματα των επισκοπήσεων κάθε μιας χώρας.""",
-    '''Μέχρι αργά χθες το βράδυ ο πλοιοκτήτης παρέμενε έξω από το γραφείο του
+    """Μέχρι αργά χθες το βράδυ ο πλοιοκτήτης παρέμενε έξω από το γραφείο του
    γενικού γραμματέα του υπουργείου, ενώ είχε μόνον τηλεφωνική επικοινωνία με
-    τον υπουργό.''',
+    τον υπουργό.""",
-    '''Σύμφωνα με καλά ενημερωμένη πηγή, από την επεξεργασία του προέκυψε ότι
+    """Σύμφωνα με καλά ενημερωμένη πηγή, από την επεξεργασία του προέκυψε ότι
    οι δράστες της επίθεσης ήταν δύο, καθώς και ότι προσέγγισαν και αποχώρησαν
-    από το σημείο με μοτοσικλέτα.''',
+    από το σημείο με μοτοσικλέτα.""",
    "Η υποδομή καταλυμάτων στην Ελλάδα είναι πλήρης και ανανεώνεται συνεχώς.",
-    '''Το επείγον ταχυδρομείο (ήτοι το παραδοτέο εντός 48 ωρών το πολύ) μπορεί
+    """Το επείγον ταχυδρομείο (ήτοι το παραδοτέο εντός 48 ωρών το πολύ) μπορεί
    να μεταφέρεται αεροπορικώς μόνον εφόσον εφαρμόζονται οι κανόνες
-    ασφαλείας''',
+    ασφαλείας""",
-    ''''Στις ορεινές περιοχές του νησιού οι χιονοπτώσεις και οι παγετοί είναι
+    """'Στις ορεινές περιοχές του νησιού οι χιονοπτώσεις και οι παγετοί είναι
-    περιορισμένοι ενώ στις παραθαλάσσιες περιοχές σημειώνονται σπανίως.'''
+    περιορισμένοι ενώ στις παραθαλάσσιες περιοχές σημειώνονται σπανίως.""",
 ]
--- a/spacy/lang/el/lemmatizer/init.py
+++ b/spacy/lang/el/lemmatizer/init.py
@ -12,10 +12,19 @@ from ._verbs import VERBS
 from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES, PUNCT_RULES
-LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}
+LEMMA_INDEX = {"adj": ADJECTIVES, "adv": ADVERBS, "noun": NOUNS, "verb": VERBS}
-LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES,
+LEMMA_RULES = {
-               'punct': PUNCT_RULES}
+    "adj": ADJECTIVE_RULES,
    "noun": NOUN_RULES,
    "verb": VERB_RULES,
    "punct": PUNCT_RULES,
 }
-LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'noun': NOUNS_IRREG, 'det': DETS_IRREG, 'verb': VERBS_IRREG}
+LEMMA_EXC = {
    "adj": ADJECTIVES_IRREG,
    "noun": NOUNS_IRREG,
    "det": DETS_IRREG,
    "verb": VERBS_IRREG,
 }
--- a/spacy/lang/el/lemmatizer/_adjectives.py
+++ b/spacy/lang/el/lemmatizer/_adjectives.py
@ -1,6 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-ADJECTIVES = set("""
+
 ADJECTIVES = set(
    """
 n-διάστατος µεταφυτρωτικός άβαθος άβαλτος άβαρος άβατος άβαφος άβγαλτος άβιος
 άβλαπτος άβλεπτος άβολος άβουλος άβραστος άβρεχτος άβροχος άβυθος άγαμος
 άγγιχτος άγδαρτος άγδυτος άγευστος άγιος άγλυκος άγλωσσος άγναθος άγναντος
@ -2438,4 +2440,5 @@ ADJECTIVES = set("""
 όμορφος όνειος όξινος όρθιος όσιος όφκαιρος όψια όψιμος ύπανδρος ύπατος
 ύπουλος ύπτιος ύστατος ύστερος ύψιστος ώριμος ώριος ἀγκυλωτός ἀκαταμέτρητος
 ἄπειρος ἄτροπος ἐλαφρός ἐνεστώς ἐνυπόστατος ἔναυλος ἥττων ἰσχυρός ἵστωρ
-""".split())
+""".split()
 )
--- a/spacy/lang/el/lemmatizer/_adjectives_irreg.py
+++ b/spacy/lang/el/lemmatizer/_adjectives_irreg.py
@ -32,5 +32,4 @@ ADJECTIVES_IRREG = {
    "πολύς": ("πολύ",),
    "πολλύ": ("πολύ",),
    "πολλύς": ("πολύ",),
 }
--- a/spacy/lang/el/lemmatizer/_adverbs.py
+++ b/spacy/lang/el/lemmatizer/_adverbs.py
@ -1,6 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-ADVERBS = set("""
+
 ADVERBS = set(
    """
 άβλαβα άβολα άβουλα άγαν άγαρμπα άγγιχτα άγνωμα άγρια άγρυπνα άδηλα άδικα
 άδοξα άθελα άθλια άκαιρα άκακα άκαμπτα άκαρδα άκαρπα άκεφα άκομψα άκοπα άκοσμα
 άκρως άκυρα άλαλα άλιωτα άλλοθεν άλλοτε άλλως άλλωστε άλογα άλυπα άμεμπτα
@ -861,4 +863,5 @@ ADVERBS = set("""
 ψυχραντικά ψωροπερήφανα ψόφια ψύχραιμα ωδικώς ωμά ωρίμως ωραία ωραιότατα
 ωριαία ωριαίως ως ωσαύτως ωσεί ωφέλιμα ωφελίμως ωφελιμιστικά ωχρά όθε όθεν όλο
 όμορφα όντως όξω όπισθεν όπου όπως όρθια όρτσα όσια όσο όχι όψιμα ύπερθεν
-""".split())
+""".split()
 )
--- a/spacy/lang/el/lemmatizer/_dets.py
+++ b/spacy/lang/el/lemmatizer/_dets.py
@ -1,5 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-DETS = set("""
+
 DETS = set(
    """
 ένας η ο το τη
-""".split())
+""".split()
 )
--- a/spacy/lang/el/lemmatizer/_lemma_rules.py
+++ b/spacy/lang/el/lemmatizer/_lemma_rules.py
@ -140,17 +140,7 @@ VERB_RULES = [
    ["ξουμε", "ζω"],
    ["ξετε", "ζω"],
    ["ξουν", "ζω"],
 ]
-PUNCT_RULES = [
+PUNCT_RULES = [["“", '"'], ["”", '"'], ["\u2018", "'"], ["\u2019", "'"]]
    ["“", "\""],
    ["”", "\""],
    ["\u2018", "'"],
    ["\u2019", "'"]
 ]
--- a/spacy/lang/el/lemmatizer/_nouns.py
+++ b/spacy/lang/el/lemmatizer/_nouns.py
@ -1,6 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-NOUNS = set("""
+
 NOUNS = set(
    """
 -αλγία -βατώ -βατῶ -ούλα -πληξία -ώνυμο sofa table άβακας άβατο άβατον άβυσσος
 άγανο άγαρ άγγελμα άγγελος άγγιγμα άγγισμα άγγλος άγημα άγιασμα άγιο φως
 άγκλισμα άγκυρα άγμα άγνοια άγνωστος άγονο άγος άγουρος άγουσα άγρα άγρευμα
@ -6066,4 +6068,5 @@ NOUNS = set("""
 ἐντευκτήριον ἐντόσθια ἐξοικείωσις ἐξοχή ἐξωκκλήσιον ἐπίσκεψις ἐπίσχεστρον
 ἐρωτίς ἑρμηνεία ἔκθλιψις ἔκτισις ἔκτρωμα ἔπαλξις ἱππάρχας ἱππάρχης ἴς ἵππαρχος
 ὑστερικός ὕστερον ὠάριον ὠοθήκη ὠοθηκῖτις ὠοθυλάκιον ὠορρηξία ὠοσκόπιον
-""".split())
+""".split()
 )
--- a/spacy/lang/el/lemmatizer/_participles.py
+++ b/spacy/lang/el/lemmatizer/_participles.py
@ -1,6 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-PARTICIPLES = set("""
+
 PARTICIPLES = set(
    """
 έρποντας έχοντας αβανιάζοντας αβγατισμένος αγαπημένος αγαπώντας αγγίζοντας
 αγγιγμένος αγιασμένος αγιογραφώντας αγιοποιημένος αγιοποιώντας αγκαζαρισμένος
 αγκιστρωμένος αγκυλωμένος αγκυροβολημένος αγλακώντας αγνοημένος αγνοούμενος
@ -941,4 +943,5 @@ PARTICIPLES = set("""
 ψιλούμενος ψοφολογώντας ψυχογραφώντας ψυχολογημένος ψυχομαχώντας ψυχομαχώντας
 ψυχορραγώντας ψυχρηλατώντας ψυχωμένος ψωμοζητώντας ψωμοζώντας ψωμωμένος
 ωθηθείς ωθώντας ωραιοποιημένος ωραιοποιώντας ωρυόμενος ωτοσκοπώντας όντας
-""".split())
+""".split()
 )
--- a/spacy/lang/el/lemmatizer/_proper_names.py
+++ b/spacy/lang/el/lemmatizer/_proper_names.py
@ -1,5 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-PROPER_NAMES = set("""
+
 PROPER_NAMES = set(
    """
 άαχεν άβαρος άβδηρα άβελ άβιλα άβολα άγγελοι άγγελος άγιο πνεύμα
 άγιοι τόποι άγιον όρος άγιος αθανάσιος άγιος αναστάσιος άγιος αντώνιος
 άγιος αριστείδης άγιος βαρθολομαίος άγιος βασίλειος άγιος βασίλης
@ -641,4 +644,5 @@ PROPER_NAMES = set("""
 ωρολόγιον ωρωπός ωσηέ όγκα όγκατα όγκι όθρυς όθων όιτα όλγα όλιβερ όλυμπος
 όμουρα όμπιδος όνειρος όνο όρεγκον όσακι όσατο όσκαρ όσλο όταμα ότσου όφενμπαχ
 όχιρα ύδρα ύδρος ύψιστος ώλενος ώρες ώρχους ώστιν ἀλεξανδρούπολις ἀμαλιούπολις
-""".split())
+""".split()
 )
--- a/spacy/lang/el/lemmatizer/_verbs.py
+++ b/spacy/lang/el/lemmatizer/_verbs.py
@ -1,6 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals
-VERBS = set("""
+
 VERBS = set(
    """
 'γγίζω άγομαι άγχομαι άγω άδω άπτομαι άπωσον άρχομαι άρχω άφτω έγκειται έκιοσε
 έπομαι έρπω έρχομαι έστω έχω ήγγικεν ήθελε ίπταμαι ίσταμαι αίρομαι αίρω
 αβαντάρω αβαντζάρω αβαντσάρω αβαράρω αβασκαίνω αβγατίζω αβγαταίνω αβγοκόβω
@ -1186,4 +1188,5 @@ VERBS = set("""
 ωρύομαι ωτακουστώ ωτοσκοπώ ωφελούμαι ωφελώ ωχραίνω ωχριώ όζω όψομαι ἀδικῶ
 ἀκροῶμαι ἀλέθω ἀμελῶ ἀναπτερυγιάζω ἀναπτερώνω ἀναπτερώνω ἀνασαίνω ἀναταράσσω
 ἀναφτερουγίζω ἀναφτερουγιάζω ἀναφτερώνω ἀναχωρίζω ἀντιμετρῶ ἀράζω ἀφοδεύω
-""".split())
+""".split()
 )
--- a/spacy/lang/el/lemmatizer/_verbs_irreg.py
+++ b/spacy/lang/el/lemmatizer/_verbs_irreg.py
@ -1,7 +1,6 @@
 # coding: utf8
 from __future__ import unicode_literals
 VERBS_IRREG = {
    "είσαι": ("είμαι",),
    "είναι": ("είμαι",),
@ -196,5 +195,4 @@ VERBS_IRREG = {
    "έφθασες": ("φτάνω",),
    "έφθασε": ("φτάνω",),
    "έφθασαν": ("φτάνω",),
 }
--- a/spacy/lang/el/lemmatizer/get_pos_from_wiktionary.py
+++ b/spacy/lang/el/lemmatizer/get_pos_from_wiktionary.py
@ -1,34 +1,45 @@
 # coding: utf8
 from __future__ import unicode_literals
 import re
 import pickle
 from gensim.corpora.wikicorpus import extract_pages
-regex = re.compile(r'==={{(\w+)\|el}}===')
+
-regex2 = re.compile(r'==={{(\w+ \w+)\|el}}===')
+regex = re.compile(r"==={{(\w+)\|el}}===")
 regex2 = re.compile(r"==={{(\w+ \w+)\|el}}===")
 # get words based on the Wiktionary dump
 # check only for specific parts
 # ==={{κύριο όνομα|el}}===
-expected_parts = ['μετοχή', 'ρήμα', 'επίθετο',
+expected_parts = [
-                  'επίρρημα',  'ουσιαστικό', 'κύριο όνομα', 'άρθρο']
+    "μετοχή",
    "ρήμα",
    "επίθετο",
    "επίρρημα",
    "ουσιαστικό",
    "κύριο όνομα",
    "άρθρο",
 ]
-unwanted_parts = '''
+unwanted_parts = """
    {'αναγραμματισμοί': 2, 'σύνδεσμος': 94, 'απαρέμφατο': 1, 'μορφή άρθρου': 1, 'ένθημα': 1, 'μερική συνωνυμία': 57, 'ορισμός': 1, 'σημείωση': 3, 'πρόσφυμα': 3, 'ταυτόσημα': 8, 'χαρακτήρας': 51, 'μορφή επιρρήματος': 1, 'εκφράσεις': 22, 'ρηματικό σχήμα': 3, 'πολυλεκτικό επίρρημα': 2, 'μόριο': 35, 'προφορά': 412, 'ρηματική έκφραση': 15, 'λογοπαίγνια': 2, 'πρόθεση': 46, 'ρηματικό επίθετο': 1, 'κατάληξη επιρρημάτων': 10, 'συναφείς όροι': 1, 'εξωτερικοί σύνδεσμοι': 1, 'αρσενικό γένος': 1, 'πρόθημα': 169, 'κατάληξη': 3, 'υπώνυμα': 7, 'επιφώνημα': 197, 'ρηματικός τύπος': 1, 'συντομομορφή': 560, 'μορφή ρήματος': 68282, 'μορφή επιθέτου': 61779, 'μορφές': 71, 'ιδιωματισμός': 2, 'πολυλεκτικός όρος': 719, 'πολυλεκτικό ουσιαστικό': 180, 'παράγωγα': 25, 'μορφή μετοχής': 806, 'μορφή αριθμητικού': 3, 'άκλιτο': 1, 'επίθημα': 181, 'αριθμητικό': 129, 'συγγενικά': 94, 'σημειώσεις': 45, 'Ιδιωματισμός': 1, 'ρητά': 12, 'φράση': 9, 'συνώνυμα': 556, 'μεταφράσεις': 1, 'κατάληξη ρημάτων': 15, 'σύνθετα': 27, 'υπερώνυμα': 1, 'εναλλακτικός τύπος': 22, 'μορφή ουσιαστικού': 35122, 'επιρρηματική έκφραση': 12, 'αντώνυμα': 76, 'βλέπε': 7, 'μορφή αντωνυμίας': 51, 'αντωνυμία': 100, 'κλίση': 11, 'σύνθετοι τύποι': 1, 'παροιμία': 5, 'μορφή_επιθέτου': 2, 'έκφραση': 738, 'σύμβολο': 8, 'πολυλεκτικό επίθετο': 1, 'ετυμολογία': 867}
-'''
+"""
-wiktionary_file_path = '/data/gsoc2018-spacy/spacy/lang/el/res/elwiktionary-latest-pages-articles.xml'
+wiktionary_file_path = (
    "/data/gsoc2018-spacy/spacy/lang/el/res/elwiktionary-latest-pages-articles.xml"
 )
 proper_names_dict = {
-    'ουσιαστικό':'nouns',
+    "ουσιαστικό": "nouns",
-    'επίθετο':'adjectives',
+    "επίθετο": "adjectives",
-    'άρθρο':'dets',
+    "άρθρο": "dets",
-    'επίρρημα':'adverbs',
+    "επίρρημα": "adverbs",
-    'κύριο όνομα': 'proper_names',
+    "κύριο όνομα": "proper_names",
-    'μετοχή': 'participles',
+    "μετοχή": "participles",
-    'ρήμα': 'verbs'
+    "ρήμα": "verbs",
 }
 expected_parts_dict = {}
 for expected_part in expected_parts:
@ -36,7 +47,7 @@ for expected_part in expected_parts:
 other_parts = {}
 for title, text, pageid in extract_pages(wiktionary_file_path):
-    if text.startswith('#REDIRECT'):
+    if text.startswith("#REDIRECT"):
        continue
    title = title.lower()
    all_regex = regex.findall(text)
@ -47,20 +58,17 @@ for title, text, pageid in extract_pages(wiktionary_file_path):
 for i in expected_parts_dict:
-    with open('_{0}.py'.format(proper_names_dict[i]), 'w') as f:
+    with open("_{0}.py".format(proper_names_dict[i]), "w") as f:
-        f.write('from __future__ import unicode_literals\n')
+        f.write("from __future__ import unicode_literals\n")
-        f.write('{} = set(\"\"\"\n'.format(proper_names_dict[i].upper()))
+        f.write('{} = set("""\n'.format(proper_names_dict[i].upper()))
        words = sorted(expected_parts_dict[i])
-        line = ''
+        line = ""
        to_write = []
        for word in words:
-            if len(line + ' ' + word) > 79:
+            if len(line + " " + word) > 79:
                to_write.append(line)
-                line = ''
+                line = ""
            else:
-                line = line + ' ' + word
+                line = line + " " + word
-        f.write('\n'.join(to_write))
+        f.write("\n".join(to_write))
-        f.write('\n\"\"\".split())')
+        f.write('\n""".split())')
--- a/spacy/lang/el/lemmatizer/lemmatizer.py
+++ b/spacy/lang/el/lemmatizer/lemmatizer.py
@ -3,7 +3,9 @@ from __future__ import unicode_literals
 from ....symbols import NOUN, VERB, ADJ, PUNCT
-'''
+
 class GreekLemmatizer(object):
    """
    Greek language lemmatizer applies the default rule based lemmatization
    procedure with some modifications for better Greek language support.
@ -11,10 +13,8 @@ The first modification is that it checks if the word for lemmatization is
    already a lemma and if yes, it just returns it.
    The second modification is about removing the base forms function which is
    not applicable for Greek language.
-'''
+    """
 class GreekLemmatizer(object):
    @classmethod
    def load(cls, path, index=None, exc=None, rules=None, lookup=None):
        return cls(index, exc, rules, lookup)
@ -28,26 +28,29 @@ class GreekLemmatizer(object):
    def __call__(self, string, univ_pos, morphology=None):
        if not self.rules:
            return [self.lookup_table.get(string, string)]
-        if univ_pos in (NOUN, 'NOUN', 'noun'):
+        if univ_pos in (NOUN, "NOUN", "noun"):
-            univ_pos = 'noun'
+            univ_pos = "noun"
-        elif univ_pos in (VERB, 'VERB', 'verb'):
+        elif univ_pos in (VERB, "VERB", "verb"):
-            univ_pos = 'verb'
+            univ_pos = "verb"
-        elif univ_pos in (ADJ, 'ADJ', 'adj'):
+        elif univ_pos in (ADJ, "ADJ", "adj"):
-            univ_pos = 'adj'
+            univ_pos = "adj"
-        elif univ_pos in (PUNCT, 'PUNCT', 'punct'):
+        elif univ_pos in (PUNCT, "PUNCT", "punct"):
-            univ_pos = 'punct'
+            univ_pos = "punct"
        else:
            return list(set([string.lower()]))
-        lemmas = lemmatize(string, self.index.get(univ_pos, {}),
+        lemmas = lemmatize(
            string,
            self.index.get(univ_pos, {}),
            self.exc.get(univ_pos, {}),
-                           self.rules.get(univ_pos, []))
+            self.rules.get(univ_pos, []),
        )
        return lemmas
 def lemmatize(string, index, exceptions, rules):
    string = string.lower()
    forms = []
-    if (string in index):
+    if string in index:
        forms.append(string)
        return forms
    forms.extend(exceptions.get(string, []))
--- a/spacy/lang/el/lex_attrs.py
+++ b/spacy/lang/el/lex_attrs.py
@ -4,43 +4,100 @@ from __future__ import unicode_literals
 from ...attrs import LIKE_NUM
-_num_words = ['μηδέν', 'ένας', 'δυο', 'δυό', 'τρεις', 'τέσσερις', 'πέντε',
+_num_words = [
-              'έξι', 'εφτά', 'επτά', 'οκτώ', 'οχτώ',
+    "μηδέν",
-              'εννιά', 'εννέα', 'δέκα', 'έντεκα', 'ένδεκα', 'δώδεκα',
+    "ένας",
-              'δεκατρείς', 'δεκατέσσερις', 'δεκαπέντε', 'δεκαέξι', 'δεκαεπτά',
+    "δυο",
-              'δεκαοχτώ', 'δεκαεννέα', 'δεκαεννεα', 'είκοσι', 'τριάντα',
+    "δυό",
-              'σαράντα', 'πενήντα', 'εξήντα', 'εβδομήντα', 'ογδόντα',
+    "τρεις",
-              'ενενήντα', 'εκατό', 'διακόσιοι', 'διακόσοι', 'τριακόσιοι',
+    "τέσσερις",
-              'τριακόσοι', 'τετρακόσιοι', 'τετρακόσοι', 'πεντακόσιοι',
+    "πέντε",
-              'πεντακόσοι', 'εξακόσιοι', 'εξακόσοι', 'εφτακόσιοι', 'εφτακόσοι',
+    "έξι",
-              'επτακόσιοι', 'επτακόσοι', 'οχτακόσιοι', 'οχτακόσοι',
+    "εφτά",
-              'οκτακόσιοι', 'οκτακόσοι', 'εννιακόσιοι', 'χίλιοι', 'χιλιάδα',
+    "επτά",
-              'εκατομμύριο', 'δισεκατομμύριο', 'τρισεκατομμύριο', 'τετράκις',
+    "οκτώ",
-              'πεντάκις', 'εξάκις', 'επτάκις', 'οκτάκις', 'εννεάκις', 'ένα',
+    "οχτώ",
-              'δύο', 'τρία', 'τέσσερα', 'δις', 'χιλιάδες']
+    "εννιά",
    "εννέα",
    "δέκα",
    "έντεκα",
    "ένδεκα",
    "δώδεκα",
    "δεκατρείς",
    "δεκατέσσερις",
    "δεκαπέντε",
    "δεκαέξι",
    "δεκαεπτά",
    "δεκαοχτώ",
    "δεκαεννέα",
    "δεκαεννεα",
    "είκοσι",
    "τριάντα",
    "σαράντα",
    "πενήντα",
    "εξήντα",
    "εβδομήντα",
    "ογδόντα",
    "ενενήντα",
    "εκατό",
    "διακόσιοι",
    "διακόσοι",
    "τριακόσιοι",
    "τριακόσοι",
    "τετρακόσιοι",
    "τετρακόσοι",
    "πεντακόσιοι",
    "πεντακόσοι",
    "εξακόσιοι",
    "εξακόσοι",
    "εφτακόσιοι",
    "εφτακόσοι",
    "επτακόσιοι",
    "επτακόσοι",
    "οχτακόσιοι",
    "οχτακόσοι",
    "οκτακόσιοι",
    "οκτακόσοι",
    "εννιακόσιοι",
    "χίλιοι",
    "χιλιάδα",
    "εκατομμύριο",
    "δισεκατομμύριο",
    "τρισεκατομμύριο",
    "τετράκις",
    "πεντάκις",
    "εξάκις",
    "επτάκις",
    "οκτάκις",
    "εννεάκις",
    "ένα",
    "δύο",
    "τρία",
    "τέσσερα",
    "δις",
    "χιλιάδες",
 ]
 def like_num(text):
-    if text.startswith(('+', '-', '±', '~')):
+    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
-    text = text.replace(',', '').replace('.', '')
+    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
-    if text.count('/') == 1:
+    if text.count("/") == 1:
-        num, denom = text.split('/')
+        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
-    if text.count('^') == 1:
+    if text.count("^") == 1:
-        num, denom = text.split('^')
+        num, denom = text.split("^")
        if num.isdigit() and denom.isdigit():
            return True
-    if text.lower() in _num_words or text.lower().split(' ')[0] in _num_words:
+    if text.lower() in _num_words or text.lower().split(" ")[0] in _num_words:
        return True
    if text in _num_words:
        return True
    return False
-LEX_ATTRS = {
+LEX_ATTRS = {LIKE_NUM: like_num}
    LIKE_NUM: like_num
 }
--- a/spacy/lang/el/norm_exceptions.py
+++ b/spacy/lang/el/norm_exceptions.py
@ -3,8 +3,6 @@ from __future__ import unicode_literals
 # These exceptions are used to add NORM values based on a token's ORTH value.
 # Norms are only set if no alternative is provided in the tokenizer exceptions.
 _exc = {
--- a/spacy/lang/el/punctuation.py
+++ b/spacy/lang/el/punctuation.py
@ -6,66 +6,91 @@ from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES, LIST_CURRENCY
 from ..char_classes import LIST_ICONS, ALPHA_LOWER, ALPHA_UPPER, ALPHA, HYPHENS
 from ..char_classes import QUOTES, CURRENCY
-_units = ('km km² km³ m m² m³ dm dm² dm³ cm cm² cm³ mm mm² mm³ ha µm nm yd in ft '
+_units = (
-          'kg g mg µg t lb oz m/s km/h kmh mph hPa Pa mbar mb MB kb KB gb GB tb '
+    "km km² km³ m m² m³ dm dm² dm³ cm cm² cm³ mm mm² mm³ ha µm nm yd in ft "
-          'TB T G M K км км² км³ м м² м³ дм дм² дм³ см см² см³ мм мм² мм³ нм '
+    "kg g mg µg t lb oz m/s km/h kmh mph hPa Pa mbar mb MB kb KB gb GB tb "
-          'кг г мг м/с км/ч кПа Па мбар Кб КБ кб Мб МБ мб Гб ГБ гб Тб ТБ тб')
+    "TB T G M K км км² км³ м м² м³ дм дм² дм³ см см² см³ мм мм² мм³ нм "
    "кг г мг м/с км/ч кПа Па мбар Кб КБ кб Мб МБ мб Гб ГБ гб Тб ТБ тб"
 )
-def merge_chars(char): return char.strip().replace(' ', '|')
+def merge_chars(char):
    return char.strip().replace(" ", "|")
 UNITS = merge_chars(_units)
-_prefixes = (['\'\'', '§', '%', '=', r'\+[0-9]+%',  # 90%
+_prefixes = (
-              r'\'([0-9]){2}([\-]\'([0-9]){2})*',  # '12'-13
+    [
-              r'\-([0-9]){1,9}\.([0-9]){1,9}',  # -12.13
+        "''",
-              r'\'([Α-Ωα-ωίϊΐόάέύϋΰήώ]+)\'',  # 'αβγ'
+        "§",
-              r'([Α-Ωα-ωίϊΐόάέύϋΰήώ]){1,3}\'',  # αβγ'
+        "%",
-              r'http://www.[A-Za-z]+\-[A-Za-z]+(\.[A-Za-z]+)+(\/[A-Za-z]+)*(\.[A-Za-z]+)*',
+        "=",
-              r'[ΈΆΊΑ-Ωα-ωίϊΐόάέύϋΰήώ]+\*',  # όνομα*
+        r"\+[0-9]+%",  # 90%
-              r'\$([0-9])+([\,\.]([0-9])+){0,1}',
+        r"\'([0-9]){2}([\-]\'([0-9]){2})*",  # '12'-13
-              ] + LIST_PUNCT + LIST_ELLIPSES + LIST_QUOTES +
+        r"\-([0-9]){1,9}\.([0-9]){1,9}",  # -12.13
-             LIST_CURRENCY + LIST_ICONS)
+        r"\'([Α-Ωα-ωίϊΐόάέύϋΰήώ]+)\'",  # 'αβγ'
        r"([Α-Ωα-ωίϊΐόάέύϋΰήώ]){1,3}\'",  # αβγ'
        r"http://www.[A-Za-z]+\-[A-Za-z]+(\.[A-Za-z]+)+(\/[A-Za-z]+)*(\.[A-Za-z]+)*",
        r"[ΈΆΊΑ-Ωα-ωίϊΐόάέύϋΰήώ]+\*",  # όνομα*
        r"\$([0-9])+([\,\.]([0-9])+){0,1}",
    ]
    + LIST_PUNCT
    + LIST_ELLIPSES
    + LIST_QUOTES
    + LIST_CURRENCY
    + LIST_ICONS
 )
-_suffixes = (LIST_PUNCT + LIST_ELLIPSES + LIST_QUOTES + LIST_ICONS +
+_suffixes = (
-             [r'(?<=[0-9])\+',  # 12+
+    LIST_PUNCT
-              r'([0-9])+\'',  # 12'
+    + LIST_ELLIPSES
-              r'([A-Za-z])?\'',  # a'
+    + LIST_QUOTES
-              r'^([0-9]){1,2}\.',  # 12.
+    + LIST_ICONS
-              r' ([0-9]){1,2}\.',  # 12.
+    + [
-              r'([0-9]){1}\) ',  # 12)
+        r"(?<=[0-9])\+",  # 12+
-              r'^([0-9]){1}\)$',  # 12)
+        r"([0-9])+\'",  # 12'
-              r'(?<=°[FfCcKk])\.',
+        r"([A-Za-z])?\'",  # a'
-              r'([0-9])+\&',  # 12&
+        r"^([0-9]){1,2}\.",  # 12.
-              r'(?<=[0-9])(?:{})'.format(CURRENCY),
+        r" ([0-9]){1,2}\.",  # 12.
-              r'(?<=[0-9])(?:{})'.format(UNITS),
+        r"([0-9]){1}\) ",  # 12)
-              r'(?<=[0-9{}{}(?:{})])\.'.format(ALPHA_LOWER, r'²\-\)\]\+', QUOTES),
+        r"^([0-9]){1}\)$",  # 12)
-              r'(?<=[{a}][{a}])\.'.format(a=ALPHA_UPPER),
+        r"(?<=°[FfCcKk])\.",
-              r'(?<=[Α-Ωα-ωίϊΐόάέύϋΰήώ])\-',  # όνομα-
+        r"([0-9])+\&",  # 12&
-              r'(?<=[Α-Ωα-ωίϊΐόάέύϋΰήώ])\.',
+        r"(?<=[0-9])(?:{})".format(CURRENCY),
-              r'^[Α-Ω]{1}\.',
+        r"(?<=[0-9])(?:{})".format(UNITS),
-              r'\ [Α-Ω]{1}\.',
+        r"(?<=[0-9{}{}(?:{})])\.".format(ALPHA_LOWER, r"²\-\)\]\+", QUOTES),
        r"(?<=[{a}][{a}])\.".format(a=ALPHA_UPPER),
        r"(?<=[Α-Ωα-ωίϊΐόάέύϋΰήώ])\-",  # όνομα-
        r"(?<=[Α-Ωα-ωίϊΐόάέύϋΰήώ])\.",
        r"^[Α-Ω]{1}\.",
        r"\ [Α-Ω]{1}\.",
        # πρώτος-δεύτερος , πρώτος-δεύτερος-τρίτος
-              r'[ΈΆΊΑΌ-Ωα-ωίϊΐόάέύϋΰήώ]+([\-]([ΈΆΊΑΌ-Ωα-ωίϊΐόάέύϋΰήώ]+))+',
+        r"[ΈΆΊΑΌ-Ωα-ωίϊΐόάέύϋΰήώ]+([\-]([ΈΆΊΑΌ-Ωα-ωίϊΐόάέύϋΰήώ]+))+",
-              r'([0-9]+)mg',  # 13mg
+        r"([0-9]+)mg",  # 13mg
-              r'([0-9]+)\.([0-9]+)m'  # 1.2m
+        r"([0-9]+)\.([0-9]+)m",  # 1.2m
-              ])
+    ]
 )
-_infixes = (LIST_ELLIPSES + LIST_ICONS +
+_infixes = (
-            [r'(?<=[0-9])[+\/\-\*^](?=[0-9])',  # 1/2 , 1-2 , 1*2
+    LIST_ELLIPSES
-             r'([a-zA-Z]+)\/([a-zA-Z]+)\/([a-zA-Z]+)',  # name1/name2/name3
+    + LIST_ICONS
-             r'([0-9])+(\.([0-9]+))*([\-]([0-9])+)+',  # 10.9 , 10.9.9 , 10.9-6
+    + [
-             r'([0-9])+[,]([0-9])+[\-]([0-9])+[,]([0-9])+',  # 10,11,12
+        r"(?<=[0-9])[+\/\-\*^](?=[0-9])",  # 1/2 , 1-2 , 1*2
-             r'([0-9])+[ης]+([\-]([0-9])+)+',  # 1ης-2
+        r"([a-zA-Z]+)\/([a-zA-Z]+)\/([a-zA-Z]+)",  # name1/name2/name3
        r"([0-9])+(\.([0-9]+))*([\-]([0-9])+)+",  # 10.9 , 10.9.9 , 10.9-6
        r"([0-9])+[,]([0-9])+[\-]([0-9])+[,]([0-9])+",  # 10,11,12
        r"([0-9])+[ης]+([\-]([0-9])+)+",  # 1ης-2
        # 15/2 , 15/2/17 , 2017/2/15
-             r'([0-9]){1,4}[\/]([0-9]){1,2}([\/]([0-9]){0,4}){0,1}',
+        r"([0-9]){1,4}[\/]([0-9]){1,2}([\/]([0-9]){0,4}){0,1}",
-             r'[A-Za-z]+\@[A-Za-z]+(\-[A-Za-z]+)*\.[A-Za-z]+',  # abc@cde-fgh.a
+        r"[A-Za-z]+\@[A-Za-z]+(\-[A-Za-z]+)*\.[A-Za-z]+",  # abc@cde-fgh.a
-             r'([a-zA-Z]+)(\-([a-zA-Z]+))+',  # abc-abc
+        r"([a-zA-Z]+)(\-([a-zA-Z]+))+",  # abc-abc
-             r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
+        r"(?<=[{}])\.(?=[{}])".format(ALPHA_LOWER, ALPHA_UPPER),
-             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
+        r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
        r'(?<=[{a}])[?";:=,.]*(?:{h})(?=[{a}])'.format(a=ALPHA, h=HYPHENS),
-             r'(?<=[{a}"])[:<>=/](?=[{a}])'.format(a=ALPHA)])
+        r'(?<=[{a}"])[:<>=/](?=[{a}])'.format(a=ALPHA),
    ]
 )
 TOKENIZER_PREFIXES = _prefixes
 TOKENIZER_SUFFIXES = _suffixes
--- a/spacy/lang/el/stop_words.py
+++ b/spacy/lang/el/stop_words.py
@ -1,13 +1,11 @@
-# -*- coding: utf-8 -*-
+# coding: utf8
 from __future__ import unicode_literals
 # Stop words
 # Link to greek stop words: https://www.translatum.gr/forum/index.php?topic=3550.0?topic=3550.0
-
+STOP_WORDS = set(
-
+    """
 STOP_WORDS = set("""
 αδιάκοπα αι ακόμα ακόμη ακριβώς άλλα αλλά αλλαχού άλλες άλλη άλλην
 άλλης αλλιώς αλλιώτικα άλλο άλλοι αλλοιώς αλλοιώτικα άλλον άλλος άλλοτε αλλού
 άλλους άλλων άμα άμεσα αμέσως αν ανά ανάμεσα αναμεταξύ άνευ αντί αντίπερα αντίς
@ -89,4 +87,5 @@ STOP_WORDS = set("""
 χωρίς χωριστά
 ω ως ωσάν ωσότου ώσπου ώστε ωστόσο ωχ
-""".split())
+""".split()
 )
--- a/spacy/lang/el/syntax_iterators.py
+++ b/spacy/lang/el/syntax_iterators.py
@ -8,18 +8,16 @@ def noun_chunks(obj):
    """
    Detect base noun phrases. Works on both Doc and Span.
    """
-
+    # It follows the logic of the noun chunks finder of English language,
    # it follows the logic of the noun chunks finder of English language,
    # adjusted to some Greek language special characteristics.
    # obj tag corrects some DEP tagger mistakes.
    # Further improvement of the models will eliminate the need for this tag.
-    labels = ['nsubj', 'obj', 'iobj', 'appos', 'ROOT', 'obl']
+    labels = ["nsubj", "obj", "iobj", "appos", "ROOT", "obl"]
    doc = obj.doc  # Ensure works on both Doc and Span.
    np_deps = [doc.vocab.strings.add(label) for label in labels]
-    conj = doc.vocab.strings.add('conj')
+    conj = doc.vocab.strings.add("conj")
-    nmod = doc.vocab.strings.add('nmod')
+    nmod = doc.vocab.strings.add("nmod")
-    np_label = doc.vocab.strings.add('NP')
+    np_label = doc.vocab.strings.add("NP")
    seen = set()
    for i, word in enumerate(obj):
        if word.pos not in (NOUN, PROPN, PRON):
@ -31,16 +29,17 @@ def noun_chunks(obj):
            if any(w.i in seen for w in word.subtree):
                continue
            flag = False
-            if (word.pos == NOUN):
+            if word.pos == NOUN:
                #  check for patterns such as γραμμή παραγωγής
                for potential_nmod in word.rights:
-                    if (potential_nmod.dep == nmod):
+                    if potential_nmod.dep == nmod:
-                        seen.update(j for j in range(
+                        seen.update(
-                            word.left_edge.i, potential_nmod.i + 1))
+                            j for j in range(word.left_edge.i, potential_nmod.i + 1)
                        )
                        yield word.left_edge.i, potential_nmod.i + 1, np_label
                        flag = True
                        break
-            if (flag is False):
+            if flag is False:
                seen.update(j for j in range(word.left_edge.i, word.i + 1))
                yield word.left_edge.i, word.i + 1, np_label
        elif word.dep == conj:
@ -56,6 +55,4 @@ def noun_chunks(obj):
                yield word.left_edge.i, word.i + 1, np_label
-SYNTAX_ITERATORS = {
+SYNTAX_ITERATORS = {"noun_chunks": noun_chunks}
    'noun_chunks': noun_chunks
 }
--- a/spacy/lang/el/tag_map.py
+++ b/spacy/lang/el/tag_map.py
--- a/spacy/lang/el/tag_map_general.py
+++ b/spacy/lang/el/tag_map_general.py
@ -1,3 +1,4 @@
 # coding: utf8
 from __future__ import unicode_literals
 from ...symbols import POS, ADV, NOUN, ADP, PRON, SCONJ, PROPN, DET, SYM, INTJ
@ -22,5 +23,5 @@ TAG_MAP = {
    "AUX": {POS: AUX},
    "SPACE": {POS: SPACE},
    "DET": {POS: DET},
-    "X": {POS: X}
+    "X": {POS: X},
 }
--- a/spacy/lang/el/tokenizer_exceptions.py
+++ b/spacy/lang/el/tokenizer_exceptions.py
@ -1,303 +1,132 @@
-# -*- coding: utf-8 -*-
+# coding: utf8
 from __future__ import unicode_literals
 from ...symbols import ORTH, LEMMA, NORM
 _exc = {}
 for token in ["Απ'", "ΑΠ'", "αφ'", "Αφ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "από", NORM: "από"}]
        {ORTH: token, LEMMA: "από", NORM: "από"}
    ]
 for token in ["Αλλ'", "αλλ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "αλλά", NORM: "αλλά"}]
        {ORTH: token, LEMMA: "αλλά", NORM: "αλλά"}
    ]
 for token in ["παρ'", "Παρ'", "ΠΑΡ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "παρά", NORM: "παρά"}]
        {ORTH: token, LEMMA: "παρά", NORM: "παρά"}
    ]
 for token in ["καθ'", "Καθ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "κάθε", NORM: "κάθε"}]
        {ORTH: token, LEMMA: "κάθε", NORM: "κάθε"}
    ]
 for token in ["κατ'", "Κατ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "κατά", NORM: "κατά"}]
        {ORTH: token, LEMMA: "κατά", NORM: "κατά"}
    ]
 for token in ["'ΣΟΥΝ", "'ναι", "'ταν", "'τανε", "'μαστε", "'μουνα", "'μουν"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "είμαι", NORM: "είμαι"}]
        {ORTH: token, LEMMA: "είμαι", NORM: "είμαι"}
    ]
 for token in ["Επ'", "επ'", "εφ'", "Εφ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "επί", NORM: "επί"}]
        {ORTH: token, LEMMA: "επί", NORM: "επί"}
    ]
 for token in ["Δι'", "δι'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "δια", NORM: "δια"}]
        {ORTH: token, LEMMA: "δια", NORM: "δια"}
    ]
 for token in ["'χουν", "'χουμε", "'χαμε", "'χα", "'χε", "'χεις", "'χει"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "έχω", NORM: "έχω"}]
        {ORTH: token, LEMMA: "έχω", NORM: "έχω"}
    ]
 for token in ["υπ'", "Υπ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "υπό", NORM: "υπό"}]
        {ORTH: token, LEMMA: "υπό", NORM: "υπό"}
    ]
 for token in ["Μετ'", "ΜΕΤ'", "'μετ"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "μετά", NORM: "μετά"}]
        {ORTH: token, LEMMA: "μετά", NORM: "μετά"}
    ]
 for token in ["Μ'", "μ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "με", NORM: "με"}]
        {ORTH: token, LEMMA: "με", NORM: "με"}
    ]
 for token in ["Γι'", "ΓΙ'", "γι'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "για", NORM: "για"}]
        {ORTH: token, LEMMA: "για", NORM: "για"}
    ]
 for token in ["Σ'", "σ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "σε", NORM: "σε"}]
        {ORTH: token, LEMMA: "σε", NORM: "σε"}
    ]
 for token in ["Θ'", "θ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "θα", NORM: "θα"}]
        {ORTH: token, LEMMA: "θα", NORM: "θα"}
    ]
 for token in ["Ν'", "ν'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "να", NORM: "να"}]
        {ORTH: token, LEMMA: "να", NORM: "να"}
    ]
 for token in ["Τ'", "τ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "να", NORM: "να"}]
        {ORTH: token, LEMMA: "να", NORM: "να"}
    ]
 for token in ["'γω", "'σένα", "'μεις"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "εγώ", NORM: "εγώ"}]
        {ORTH: token, LEMMA: "εγώ", NORM: "εγώ"}
    ]
 for token in ["Τ'", "τ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "το", NORM: "το"}]
        {ORTH: token, LEMMA: "το", NORM: "το"}
    ]
 for token in ["Φέρ'", "Φερ'", "φέρ'", "φερ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "φέρνω", NORM: "φέρνω"}]
        {ORTH: token, LEMMA: "φέρνω", NORM: "φέρνω"}
    ]
 for token in ["'ρθούνε", "'ρθουν", "'ρθει", "'ρθεί", "'ρθε", "'ρχεται"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "έρχομαι", NORM: "έρχομαι"}]
        {ORTH: token, LEMMA: "έρχομαι", NORM: "έρχομαι"}
    ]
 for token in ["'πανε", "'λεγε", "'λεγαν", "'πε", "'λεγα"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "λέγω", NORM: "λέγω"}]
        {ORTH: token, LEMMA: "λέγω", NORM: "λέγω"}
    ]
 for token in ["Πάρ'", "πάρ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "παίρνω", NORM: "παίρνω"}]
        {ORTH: token, LEMMA: "παίρνω", NORM: "παίρνω"}
    ]
 for token in ["μέσ'", "Μέσ'", "μεσ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "μέσα", NORM: "μέσα"}]
        {ORTH: token, LEMMA: "μέσα", NORM: "μέσα"}
    ]
 for token in ["Δέσ'", "Δεσ'", "δεσ'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "δένω", NORM: "δένω"}]
        {ORTH: token, LEMMA: "δένω", NORM: "δένω"}
    ]
 for token in ["'κανε", "Κάν'"]:
-    _exc[token] = [
+    _exc[token] = [{ORTH: token, LEMMA: "κάνω", NORM: "κάνω"}]
        {ORTH: token, LEMMA: "κάνω", NORM: "κάνω"}
    ]
 _other_exc = {
-
+    "κι": [{ORTH: "κι", LEMMA: "και", NORM: "και"}],
-    "κι": [
+    "Παίξ'": [{ORTH: "Παίξ'", LEMMA: "παίζω", NORM: "παίζω"}],
-        {ORTH: "κι", LEMMA: "και", NORM: "και"},
+    "Αντ'": [{ORTH: "Αντ'", LEMMA: "αντί", NORM: "αντί"}],
-    ],
+    "ολ'": [{ORTH: "ολ'", LEMMA: "όλος", NORM: "όλος"}],
-
+    "ύστερ'": [{ORTH: "ύστερ'", LEMMA: "ύστερα", NORM: "ύστερα"}],
-    "Παίξ'": [
+    "'πρεπε": [{ORTH: "'πρεπε", LEMMA: "πρέπει", NORM: "πρέπει"}],
-        {ORTH: "Παίξ'", LEMMA: "παίζω", NORM: "παίζω"},
+    "Δύσκολ'": [{ORTH: "Δύσκολ'", LEMMA: "δύσκολος", NORM: "δύσκολος"}],
-    ],
+    "'θελα": [{ORTH: "'θελα", LEMMA: "θέλω", NORM: "θέλω"}],
-
+    "'γραφα": [{ORTH: "'γραφα", LEMMA: "γράφω", NORM: "γράφω"}],
-    "Αντ'": [
+    "'παιρνα": [{ORTH: "'παιρνα", LEMMA: "παίρνω", NORM: "παίρνω"}],
-        {ORTH: "Αντ'", LEMMA: "αντί", NORM: "αντί"},
+    "'δειξε": [{ORTH: "'δειξε", LEMMA: "δείχνω", NORM: "δείχνω"}],
-    ],
+    "όμουρφ'": [{ORTH: "όμουρφ'", LEMMA: "όμορφος", NORM: "όμορφος"}],
-
+    "κ'τσή": [{ORTH: "κ'τσή", LEMMA: "κουτσός", NORM: "κουτσός"}],
-    "ολ'": [
+    "μηδ'": [{ORTH: "μηδ'", LEMMA: "μήδε", NORM: "μήδε"}],
        {ORTH: "ολ'", LEMMA: "όλος", NORM: "όλος"},
    ],
    "ύστερ'": [
        {ORTH: "ύστερ'", LEMMA: "ύστερα", NORM: "ύστερα"},
    ],
    "'πρεπε": [
        {ORTH: "'πρεπε", LEMMA: "πρέπει", NORM: "πρέπει"},
    ],
    "Δύσκολ'": [
        {ORTH: "Δύσκολ'", LEMMA: "δύσκολος", NORM: "δύσκολος"},
    ],
    "'θελα": [
        {ORTH: "'θελα", LEMMA: "θέλω", NORM: "θέλω"},
    ],
    "'γραφα": [
        {ORTH: "'γραφα", LEMMA: "γράφω", NORM: "γράφω"},
    ],
    "'παιρνα": [
        {ORTH: "'παιρνα", LEMMA: "παίρνω", NORM: "παίρνω"},
    ],
    "'δειξε": [
        {ORTH: "'δειξε", LEMMA: "δείχνω", NORM: "δείχνω"},
    ],
    "όμουρφ'": [
        {ORTH: "όμουρφ'", LEMMA: "όμορφος", NORM: "όμορφος"},
    ],
    "κ'τσή": [
        {ORTH: "κ'τσή", LEMMA: "κουτσός", NORM: "κουτσός"},
    ],
    "μηδ'": [
        {ORTH: "μηδ'", LEMMA: "μήδε", NORM: "μήδε"},
    ],
    "'ξομολογήθηκε": [
-        {ORTH: "'ξομολογήθηκε", LEMMA: "εξομολογούμαι", NORM: "εξομολογούμαι"},
+        {ORTH: "'ξομολογήθηκε", LEMMA: "εξομολογούμαι", NORM: "εξομολογούμαι"}
    ],
-
+    "'μας": [{ORTH: "'μας", LEMMA: "εμάς", NORM: "εμάς"}],
-    "'μας": [
+    "'ξερες": [{ORTH: "'ξερες", LEMMA: "ξέρω", NORM: "ξέρω"}],
-        {ORTH: "'μας", LEMMA: "εμάς", NORM: "εμάς"},
+    "έφθασ'": [{ORTH: "έφθασ'", LEMMA: "φθάνω", NORM: "φθάνω"}],
-    ],
+    "εξ'": [{ORTH: "εξ'", LEMMA: "εκ", NORM: "εκ"}],
-
+    "δώσ'": [{ORTH: "δώσ'", LEMMA: "δίνω", NORM: "δίνω"}],
-    "'ξερες": [
+    "τίποτ'": [{ORTH: "τίποτ'", LEMMA: "τίποτα", NORM: "τίποτα"}],
-        {ORTH: "'ξερες", LEMMA: "ξέρω", NORM: "ξέρω"},
+    "Λήξ'": [{ORTH: "Λήξ'", LEMMA: "λήγω", NORM: "λήγω"}],
-    ],
+    "άσ'": [{ORTH: "άσ'", LEMMA: "αφήνω", NORM: "αφήνω"}],
-
+    "Στ'": [{ORTH: "Στ'", LEMMA: "στο", NORM: "στο"}],
-    "έφθασ'": [
+    "Δωσ'": [{ORTH: "Δωσ'", LEMMA: "δίνω", NORM: "δίνω"}],
-        {ORTH: "έφθασ'", LEMMA: "φθάνω", NORM: "φθάνω"},
+    "Βάψ'": [{ORTH: "Βάψ'", LEMMA: "βάφω", NORM: "βάφω"}],
-    ],
+    "Αλλ'": [{ORTH: "Αλλ'", LEMMA: "αλλά", NORM: "αλλά"}],
-
+    "Αμ'": [{ORTH: "Αμ'", LEMMA: "άμα", NORM: "άμα"}],
-    "εξ'": [
+    "Αγόρασ'": [{ORTH: "Αγόρασ'", LEMMA: "αγοράζω", NORM: "αγοράζω"}],
-        {ORTH: "εξ'", LEMMA: "εκ", NORM: "εκ"},
+    "'φύγε": [{ORTH: "'φύγε", LEMMA: "φεύγω", NORM: "φεύγω"}],
-    ],
+    "'φερε": [{ORTH: "'φερε", LEMMA: "φέρνω", NORM: "φέρνω"}],
-
+    "'φαγε": [{ORTH: "'φαγε", LEMMA: "τρώω", NORM: "τρώω"}],
-    "δώσ'": [
+    "'σπαγαν": [{ORTH: "'σπαγαν", LEMMA: "σπάω", NORM: "σπάω"}],
-        {ORTH: "δώσ'", LEMMA: "δίνω", NORM: "δίνω"},
+    "'σκασε": [{ORTH: "'σκασε", LEMMA: "σκάω", NORM: "σκάω"}],
-    ],
+    "'σβηνε": [{ORTH: "'σβηνε", LEMMA: "σβήνω", NORM: "σβήνω"}],
-
+    "'ριξε": [{ORTH: "'ριξε", LEMMA: "ρίχνω", NORM: "ρίχνω"}],
-    "τίποτ'": [
+    "'κλεβε": [{ORTH: "'κλεβε", LEMMA: "κλέβω", NORM: "κλέβω"}],
-        {ORTH: "τίποτ'", LEMMA: "τίποτα", NORM: "τίποτα"},
+    "'κει": [{ORTH: "'κει", LEMMA: "εκεί", NORM: "εκεί"}],
-    ],
+    "'βλεπε": [{ORTH: "'βλεπε", LEMMA: "βλέπω", NORM: "βλέπω"}],
-
+    "'βγαινε": [{ORTH: "'βγαινε", LEMMA: "βγαίνω", NORM: "βγαίνω"}],
    "Λήξ'": [
        {ORTH: "Λήξ'", LEMMA: "λήγω", NORM: "λήγω"},
    ],
    "άσ'": [
        {ORTH: "άσ'", LEMMA: "αφήνω", NORM: "αφήνω"},
    ],
    "Στ'": [
        {ORTH: "Στ'", LEMMA: "στο", NORM: "στο"},
    ],
    "Δωσ'": [
        {ORTH: "Δωσ'", LEMMA: "δίνω", NORM: "δίνω"},
    ],
    "Βάψ'": [
        {ORTH: "Βάψ'", LEMMA: "βάφω", NORM: "βάφω"},
    ],
    "Αλλ'": [
        {ORTH: "Αλλ'", LEMMA: "αλλά", NORM: "αλλά"},
    ],
    "Αμ'": [
        {ORTH: "Αμ'", LEMMA: "άμα", NORM: "άμα"},
    ],
    "Αγόρασ'": [
        {ORTH: "Αγόρασ'", LEMMA: "αγοράζω", NORM: "αγοράζω"},
    ],
    "'φύγε": [
        {ORTH: "'φύγε", LEMMA: "φεύγω", NORM: "φεύγω"},
    ],
    "'φερε": [
        {ORTH: "'φερε", LEMMA: "φέρνω", NORM: "φέρνω"},
    ],
    "'φαγε": [
        {ORTH: "'φαγε", LEMMA: "τρώω", NORM: "τρώω"},
    ],
    "'σπαγαν": [
        {ORTH: "'σπαγαν", LEMMA: "σπάω", NORM: "σπάω"},
    ],
    "'σκασε": [
        {ORTH: "'σκασε", LEMMA: "σκάω", NORM: "σκάω"},
    ],
    "'σβηνε": [
        {ORTH: "'σβηνε", LEMMA: "σβήνω", NORM: "σβήνω"},
    ],
    "'ριξε": [
        {ORTH: "'ριξε", LEMMA: "ρίχνω", NORM: "ρίχνω"},
    ],
    "'κλεβε": [
        {ORTH: "'κλεβε", LEMMA: "κλέβω", NORM: "κλέβω"},
    ],
    "'κει": [
        {ORTH: "'κει", LEMMA: "εκεί", NORM: "εκεί"},
    ],
    "'βλεπε": [
        {ORTH: "'βλεπε", LEMMA: "βλέπω", NORM: "βλέπω"},
    ],
    "'βγαινε": [
        {ORTH: "'βγαινε", LEMMA: "βγαίνω", NORM: "βγαίνω"},
    ]
 }
 _exc.update(_other_exc)
@ -307,12 +136,14 @@ for h in range(1, 12 + 1):
    for period in ["π.μ.", "πμ"]:
        _exc["%d%s" % (h, period)] = [
            {ORTH: "%d" % h},
-            {ORTH: period, LEMMA: "π.μ.", NORM: "π.μ."}]
+            {ORTH: period, LEMMA: "π.μ.", NORM: "π.μ."},
        ]
    for period in ["μ.μ.", "μμ"]:
        _exc["%d%s" % (h, period)] = [
            {ORTH: "%d" % h},
-            {ORTH: period, LEMMA: "μ.μ.", NORM: "μ.μ."}]
+            {ORTH: period, LEMMA: "μ.μ.", NORM: "μ.μ."},
        ]
 for exc_data in [
    {ORTH: "ΑΓΡ.", LEMMA: "Αγροτικός", NORM: "Αγροτικός"},
@ -339,43 +170,228 @@ for exc_data in [
 for orth in [
    "$ΗΠΑ",
-    "Α'", "Α.Ε.", "Α.Ε.Β.Ε.", "Α.Ε.Ι.", "Α.Ε.Π.", "Α.Μ.Α.", "Α.Π.Θ.", "Α.Τ.", "Α.Χ.", "ΑΝ.", "Αγ.", "Αλ.", "Αν.",
+    "Α'",
-    "Αντ.", "Απ.",
+    "Α.Ε.",
-    "Β'", "Β)", "Β.Ζ.", "Β.Ι.Ο.", "Β.Κ.", "Β.Μ.Α.", "Βασ.",
+    "Α.Ε.Β.Ε.",
-    "Γ'", "Γ)", "Γ.Γ.", "Γ.Δ.", "Γκ.",
+    "Α.Ε.Ι.",
-    "Δ.Ε.Η.", "Δ.Ε.Σ.Ε.", "Δ.Ν.", "Δ.Ο.Υ.", "Δ.Σ.", "Δ.Υ.", "ΔΙ.ΚΑ.Τ.Σ.Α.", "Δηλ.", "Διον.",
+    "Α.Ε.Π.",
-    "Ε.Α.", "Ε.Α.Κ.", "Ε.Α.Π.", "Ε.Ε.", "Ε.Κ.", "Ε.ΚΕ.ΠΙΣ.", "Ε.Λ.Α.", "Ε.Λ.Ι.Α.", "Ε.Π.Σ.", "Ε.Π.Τ.Α.", "Ε.Σ.Ε.Ε.Κ.",
+    "Α.Μ.Α.",
-    "Ε.Υ.Κ.", "ΕΕ.", "ΕΚ.", "ΕΛ.", "ΕΛ.ΑΣ.", "Εθν.", "Ελ.", "Εμ.", "Επ.", "Ευ.",
+    "Α.Π.Θ.",
-    "Η'", "Η.Π.Α.",
+    "Α.Τ.",
-    "ΘΕ.", "Θεμ.", "Θεοδ.", "Θρ.",
+    "Α.Χ.",
-    "Ι.Ε.Κ.", "Ι.Κ.Α.", "Ι.Κ.Υ.", "Ι.Σ.Θ.", "Ι.Χ.", "ΙΖ'", "ΙΧ.",
+    "ΑΝ.",
-    "Κ.Α.Α.", "Κ.Α.Ε.", "Κ.Β.Σ.", "Κ.Δ.", "Κ.Ε.", "Κ.Ε.Κ.", "Κ.Ι.", "Κ.Κ.", "Κ.Ι.Θ.", "Κ.Ι.Θ.", "Κ.ΚΕΚ.", "Κ.Ο.",
+    "Αγ.",
-    "Κ.Π.Ρ.", "ΚΑΤ.", "ΚΚ.", "Καν.", "Καρ.", "Κατ.", "Κυρ.", "Κων.",
+    "Αλ.",
-    "Λ.Α.", "Λ.χ.", "Λ.Χ.", "Λεωφ.", "Λι.",
+    "Αν.",
-    "Μ.Δ.Ε.", "Μ.Ε.Ο.", "Μ.Ζ.", "Μ.Μ.Ε.", "Μ.Ο.", "Μεγ.", "Μιλτ.", "Μιχ.",
+    "Αντ.",
-    "Ν.Δ.", "Ν.Ε.Α.", "Ν.Κ.", "Ν.Ο.", "Ν.Ο.Θ.", "Ν.Π.Δ.Δ.", "Ν.Υ.", "ΝΔ.", "Νικ.", "Ντ'", "Ντ.",
+    "Απ.",
-    "Ο'", "Ο.Α.", "Ο.Α.Ε.Δ.", "Ο.Δ.", "Ο.Ε.Ε.", "Ο.Ε.Ε.Κ.", "Ο.Η.Ε.", "Ο.Κ.",
+    "Β'",
-    "Π.Δ.", "Π.Ε.Κ.Δ.Υ.", "Π.Ε.Π.", "Π.Μ.Σ.", "ΠΟΛ.", "Π.Χ.", "Παρ.", "Πλ.", "Πρ.",
+    "Β)",
-    "Σ.Δ.Ο.Ε.", "Σ.Ε.", "Σ.Ε.Κ.", "Σ.Π.Δ.Ω.Β.", "Σ.Τ.", "Σαβ.", "Στ.", "ΣτΕ.", "Στρ.",
+    "Β.Ζ.",
-    "Τ.Α.", "Τ.Ε.Ε.", "Τ.Ε.Ι.", "ΤΡ.", "Τζ.", "Τηλ.",
+    "Β.Ι.Ο.",
-    "Υ.Γ.", "ΥΓ.", "ΥΠ.Ε.Π.Θ.",
+    "Β.Κ.",
-    "Φ.Α.Β.Ε.", "Φ.Κ.", "Φ.Σ.", "Φ.Χ.", "Φ.Π.Α.", "Φιλ.",
+    "Β.Μ.Α.",
-    "Χ.Α.Α.", "ΧΡ.", "Χ.Χ.", "Χαρ.", "Χιλ.", "Χρ.",
+    "Βασ.",
-    "άγ.", "άρθρ.", "αι.", "αν.", "απ.", "αρ.", "αριθ.", "αριθμ.",
+    "Γ'",
-    "β'", "βλ.",
+    "Γ)",
-    "γ.γ.", "γεν.", "γραμμ.",
+    "Γ.Γ.",
-    "δ.δ.", "δ.σ.", "δηλ.", "δισ.", "δολ.", "δρχ.",
+    "Γ.Δ.",
-    "εκ.", "εκατ.", "ελ.",
+    "Γκ.",
    "Δ.Ε.Η.",
    "Δ.Ε.Σ.Ε.",
    "Δ.Ν.",
    "Δ.Ο.Υ.",
    "Δ.Σ.",
    "Δ.Υ.",
    "ΔΙ.ΚΑ.Τ.Σ.Α.",
    "Δηλ.",
    "Διον.",
    "Ε.Α.",
    "Ε.Α.Κ.",
    "Ε.Α.Π.",
    "Ε.Ε.",
    "Ε.Κ.",
    "Ε.ΚΕ.ΠΙΣ.",
    "Ε.Λ.Α.",
    "Ε.Λ.Ι.Α.",
    "Ε.Π.Σ.",
    "Ε.Π.Τ.Α.",
    "Ε.Σ.Ε.Ε.Κ.",
    "Ε.Υ.Κ.",
    "ΕΕ.",
    "ΕΚ.",
    "ΕΛ.",
    "ΕΛ.ΑΣ.",
    "Εθν.",
    "Ελ.",
    "Εμ.",
    "Επ.",
    "Ευ.",
    "Η'",
    "Η.Π.Α.",
    "ΘΕ.",
    "Θεμ.",
    "Θεοδ.",
    "Θρ.",
    "Ι.Ε.Κ.",
    "Ι.Κ.Α.",
    "Ι.Κ.Υ.",
    "Ι.Σ.Θ.",
    "Ι.Χ.",
    "ΙΖ'",
    "ΙΧ.",
    "Κ.Α.Α.",
    "Κ.Α.Ε.",
    "Κ.Β.Σ.",
    "Κ.Δ.",
    "Κ.Ε.",
    "Κ.Ε.Κ.",
    "Κ.Ι.",
    "Κ.Κ.",
    "Κ.Ι.Θ.",
    "Κ.Ι.Θ.",
    "Κ.ΚΕΚ.",
    "Κ.Ο.",
    "Κ.Π.Ρ.",
    "ΚΑΤ.",
    "ΚΚ.",
    "Καν.",
    "Καρ.",
    "Κατ.",
    "Κυρ.",
    "Κων.",
    "Λ.Α.",
    "Λ.χ.",
    "Λ.Χ.",
    "Λεωφ.",
    "Λι.",
    "Μ.Δ.Ε.",
    "Μ.Ε.Ο.",
    "Μ.Ζ.",
    "Μ.Μ.Ε.",
    "Μ.Ο.",
    "Μεγ.",
    "Μιλτ.",
    "Μιχ.",
    "Ν.Δ.",
    "Ν.Ε.Α.",
    "Ν.Κ.",
    "Ν.Ο.",
    "Ν.Ο.Θ.",
    "Ν.Π.Δ.Δ.",
    "Ν.Υ.",
    "ΝΔ.",
    "Νικ.",
    "Ντ'",
    "Ντ.",
    "Ο'",
    "Ο.Α.",
    "Ο.Α.Ε.Δ.",
    "Ο.Δ.",
    "Ο.Ε.Ε.",
    "Ο.Ε.Ε.Κ.",
    "Ο.Η.Ε.",
    "Ο.Κ.",
    "Π.Δ.",
    "Π.Ε.Κ.Δ.Υ.",
    "Π.Ε.Π.",
    "Π.Μ.Σ.",
    "ΠΟΛ.",
    "Π.Χ.",
    "Παρ.",
    "Πλ.",
    "Πρ.",
    "Σ.Δ.Ο.Ε.",
    "Σ.Ε.",
    "Σ.Ε.Κ.",
    "Σ.Π.Δ.Ω.Β.",
    "Σ.Τ.",
    "Σαβ.",
    "Στ.",
    "ΣτΕ.",
    "Στρ.",
    "Τ.Α.",
    "Τ.Ε.Ε.",
    "Τ.Ε.Ι.",
    "ΤΡ.",
    "Τζ.",
    "Τηλ.",
    "Υ.Γ.",
    "ΥΓ.",
    "ΥΠ.Ε.Π.Θ.",
    "Φ.Α.Β.Ε.",
    "Φ.Κ.",
    "Φ.Σ.",
    "Φ.Χ.",
    "Φ.Π.Α.",
    "Φιλ.",
    "Χ.Α.Α.",
    "ΧΡ.",
    "Χ.Χ.",
    "Χαρ.",
    "Χιλ.",
    "Χρ.",
    "άγ.",
    "άρθρ.",
    "αι.",
    "αν.",
    "απ.",
    "αρ.",
    "αριθ.",
    "αριθμ.",
    "β'",
    "βλ.",
    "γ.γ.",
    "γεν.",
    "γραμμ.",
    "δ.δ.",
    "δ.σ.",
    "δηλ.",
    "δισ.",
    "δολ.",
    "δρχ.",
    "εκ.",
    "εκατ.",
    "ελ.",
    "θιν'",
-    "κ.", "κ.ά.", "κ.α.", "κ.κ.", "κ.λπ.", "κ.ο.κ.", "κ.τ.λ.", "κλπ.", "κτλ.", "κυβ.",
+    "κ.",
    "κ.ά.",
    "κ.α.",
    "κ.κ.",
    "κ.λπ.",
    "κ.ο.κ.",
    "κ.τ.λ.",
    "κλπ.",
    "κτλ.",
    "κυβ.",
    "λ.χ.",
-    "μ.", "μ.Χ.", "μ.μ.", "μιλ.",
+    "μ.",
    "μ.Χ.",
    "μ.μ.",
    "μιλ.",
    "ντ'",
-    "π.Χ.", "π.β.", "π.δ.", "π.μ.", "π.χ.",
+    "π.Χ.",
-    "σ.", "σ.α.λ.", "σ.σ.", "σελ.", "στρ.",
+    "π.β.",
-    "τ'ς", "τ.μ.", "τετ.", "τετρ.", "τηλ.", "τρισ.", "τόν.",
+    "π.δ.",
    "π.μ.",
    "π.χ.",
    "σ.",
    "σ.α.λ.",
    "σ.σ.",
    "σελ.",
    "στρ.",
    "τ'ς",
    "τ.μ.",
    "τετ.",
    "τετρ.",
    "τηλ.",
    "τρισ.",
    "τόν.",
    "υπ.",
-    "χ.μ.", "χγρ.", "χιλ.", "χλμ."
+    "χ.μ.",
    "χγρ.",
    "χιλ.",
    "χλμ.",
 ]:
    _exc[orth] = [{ORTH: orth}]
--- a/spacy/lang/en/init.py
+++ b/spacy/lang/en/init.py
@ -16,15 +16,18 @@ from ...language import Language
 from ...attrs import LANG, NORM
 from ...util import update_exc, add_lookups
 def _return_en(_):
-    return 'en'
+    return "en"
 class EnglishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = _return_en
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
+    lex_attr_getters[NORM] = add_lookups(
-                                         BASE_NORMS, NORM_EXCEPTIONS)
+        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
@ -37,8 +40,8 @@ class EnglishDefaults(Language.Defaults):
 class English(Language):
-    lang = 'en'
+    lang = "en"
    Defaults = EnglishDefaults
-__all__ = ['English']
+__all__ = ["English"]
--- a/spacy/lang/en/examples.py
+++ b/spacy/lang/en/examples.py
@ -18,5 +18,5 @@ sentences = [
    "Where are you?",
    "Who is the president of France?",
    "What is the capital of the United States?",
-    "When was Barack Obama born?"
+    "When was Barack Obama born?",
 ]
--- a/spacy/lang/en/lemmatizer/init.py
+++ b/spacy/lang/en/lemmatizer/init.py
@ -1,7 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals
-from .lookup import LOOKUP
+from .lookup import LOOKUP  # noqa: F401
 from ._adjectives import ADJECTIVES
 from ._adjectives_irreg import ADJECTIVES_IRREG
 from ._adverbs import ADVERBS
@ -13,10 +13,18 @@ from ._verbs_irreg import VERBS_IRREG
 from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES, PUNCT_RULES
-LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}
+LEMMA_INDEX = {"adj": ADJECTIVES, "adv": ADVERBS, "noun": NOUNS, "verb": VERBS}
-LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'adv': ADVERBS_IRREG, 'noun': NOUNS_IRREG,
+LEMMA_EXC = {
-             'verb': VERBS_IRREG}
+    "adj": ADJECTIVES_IRREG,
    "adv": ADVERBS_IRREG,
    "noun": NOUNS_IRREG,
    "verb": VERBS_IRREG,
 }
-LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES,
+LEMMA_RULES = {
-               'punct': PUNCT_RULES}
+    "adj": ADJECTIVE_RULES,
    "noun": NOUN_RULES,
    "verb": VERB_RULES,
    "punct": PUNCT_RULES,
 }
--- a/spacy/lang/en/lemmatizer/_adjectives.py
+++ b/spacy/lang/en/lemmatizer/_adjectives.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-ADJECTIVES = set("""
+ADJECTIVES = set(
    """
 .22-caliber .22-calibre .38-caliber .38-calibre .45-caliber .45-calibre 0 1 10
 10-membered 100 1000 1000th 100th 101 101st 105 105th 10th 11 110 110th 115
 115th 11th 12 120 120th 125 125th 12th 13 130 130th 135 135th 13th 14 140 140th
@ -2824,4 +2825,5 @@ zealous zenithal zero zeroth zestful zesty zig-zag zigzag zillion zimbabwean
 zionist zippy zodiacal zoftig zoic zolaesque zonal zonary zoological zoonotic
 zoophagous zoroastrian zygodactyl zygomatic zygomorphic zygomorphous zygotic
 zymoid zymolytic zymotic
-""".split())
+""".split()
 )
--- a/spacy/lang/en/lemmatizer/_adjectives_irreg.py
+++ b/spacy/lang/en/lemmatizer/_adjectives_irreg.py
@ -48,8 +48,7 @@ ADJECTIVES_IRREG = {
    "bendier": ("bendy",),
    "bendiest": ("bendy",),
    "best": ("good",),
-    "better": ("good",
+    "better": ("good", "well"),
        "well",),
    "bigger": ("big",),
    "biggest": ("big",),
    "bitchier": ("bitchy",),
@ -289,10 +288,8 @@ ADJECTIVES_IRREG = {
    "doughtiest": ("doughty",),
    "dowdier": ("dowdy",),
    "dowdiest": ("dowdy",),
-    "dowier": ("dowie",
+    "dowier": ("dowie", "dowy"),
-        "dowy",),
+    "dowiest": ("dowie", "dowy"),
    "dowiest": ("dowie",
        "dowy",),
    "downer": ("downer",),
    "downier": ("downy",),
    "downiest": ("downy",),
@ -1494,5 +1491,5 @@ ADJECTIVES_IRREG = {
    "zanier": ("zany",),
    "zaniest": ("zany",),
    "zippier": ("zippy",),
-    "zippiest": ("zippy",)
+    "zippiest": ("zippy",),
 }
--- a/spacy/lang/en/lemmatizer/_adverbs.py
+++ b/spacy/lang/en/lemmatizer/_adverbs.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-ADVERBS = set("""
+ADVERBS = set(
    """
 'tween a.d. a.k.a. a.m. aback abaft abaxially abeam abed abjectly ably
 abnormally aboard abominably aborad abortively about above aboveboard abreast
 abroad abruptly absently absentmindedly absolutely abstemiously abstractedly
@ -540,4 +541,5 @@ wordlessly worriedly worryingly worse worst worthily worthlessly wrathfully
 wretchedly wrong wrongfully wrongheadedly wrongly wryly yea yeah yearly
 yearningly yesterday yet yieldingly yon yonder youthfully zealously zestfully
 zestily zigzag
-""".split())
+""".split()
 )
--- a/spacy/lang/en/lemmatizer/_adverbs_irreg.py
+++ b/spacy/lang/en/lemmatizer/_adverbs_irreg.py
@ -9,5 +9,5 @@ ADVERBS_IRREG = {
    "farther": ("far",),
    "further": ("far",),
    "harder": ("hard",),
-    "hardest": ("hard",)
+    "hardest": ("hard",),
 }
--- a/spacy/lang/en/lemmatizer/_lemma_rules.py
+++ b/spacy/lang/en/lemmatizer/_lemma_rules.py
@ -2,12 +2,7 @@
 from __future__ import unicode_literals
-ADJECTIVE_RULES = [
+ADJECTIVE_RULES = [["er", ""], ["est", ""], ["er", "e"], ["est", "e"]]
    ["er", ""],
    ["est", ""],
    ["er", "e"],
    ["est", "e"]
 ]
 NOUN_RULES = [
@ -19,7 +14,7 @@ NOUN_RULES = [
    ["ches", "ch"],
    ["shes", "sh"],
    ["men", "man"],
-    ["ies", "y"]
+    ["ies", "y"],
 ]
@ -31,13 +26,8 @@ VERB_RULES = [
    ["ed", "e"],
    ["ed", ""],
    ["ing", "e"],
-    ["ing", ""]
+    ["ing", ""],
 ]
-PUNCT_RULES = [
+PUNCT_RULES = [["“", '"'], ["”", '"'], ["\u2018", "'"], ["\u2019", "'"]]
    ["“", "\""],
    ["”", "\""],
    ["\u2018", "'"],
    ["\u2019", "'"]
 ]
--- a/spacy/lang/en/lemmatizer/_nouns.py
+++ b/spacy/lang/en/lemmatizer/_nouns.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-NOUNS = set("""
+NOUNS = set(
    """
 'hood .22 0 1 1-dodecanol 1-hitter 10 100 1000 10000 100000 1000000 1000000000
 1000000000000 11 11-plus 12 120 13 14 144 15 1530s 16 17 1728 1750s 1760s 1770s
 1780s 1790s 18 1820s 1830s 1840s 1850s 1860s 1870s 1880s 1890s 19 1900s 1920s
@ -7110,4 +7111,5 @@ zurvanism zweig zwieback zwingli zworykin zydeco zygnema zygnemales
 zygnemataceae zygnematales zygocactus zygoma zygomatic zygomycetes zygomycota
 zygomycotina zygophyllaceae zygophyllum zygoptera zygospore zygote zygotene
 zyloprim zymase zymogen zymology zymolysis zymosis zymurgy zyrian
-""".split())
+""".split()
 )
--- a/spacy/lang/en/lemmatizer/_verbs.py
+++ b/spacy/lang/en/lemmatizer/_verbs.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-VERBS = set("""
+VERBS = set(
    """
 aah abacinate abandon abase abash abate abbreviate abdicate abduce abduct
 aberrate abet abhor abide abjure ablactate ablate abnegate abolish abominate
 abort abound about-face abrade abrase abreact abridge abrogate abscise abscond
@ -912,4 +913,5 @@ wreck wrench wrest wrestle wrick wriggle wring wrinkle write writhe wrong x-ray
 xerox yacht yack yak yammer yank yap yarn yarn-dye yaup yaw yawl yawn yawp yearn
 yell yellow yelp yen yield yip yodel yoke yowl zap zero zest zigzag zinc zip
 zipper zone zoom
-""".split())
+""".split()
 )
--- a/spacy/lang/en/lex_attrs.py
+++ b/spacy/lang/en/lex_attrs.py
@ -4,22 +4,54 @@ from __future__ import unicode_literals
 from ...attrs import LIKE_NUM
-_num_words = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
+_num_words = [
-              'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen',
+    "zero",
-              'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen', 'twenty',
+    "one",
-              'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety',
+    "two",
-              'hundred', 'thousand', 'million', 'billion', 'trillion', 'quadrillion',
+    "three",
-              'gajillion', 'bazillion']
+    "four",
    "five",
    "six",
    "seven",
    "eight",
    "nine",
    "ten",
    "eleven",
    "twelve",
    "thirteen",
    "fourteen",
    "fifteen",
    "sixteen",
    "seventeen",
    "eighteen",
    "nineteen",
    "twenty",
    "thirty",
    "forty",
    "fifty",
    "sixty",
    "seventy",
    "eighty",
    "ninety",
    "hundred",
    "thousand",
    "million",
    "billion",
    "trillion",
    "quadrillion",
    "gajillion",
    "bazillion",
 ]
 def like_num(text):
-    if text.startswith(('+', '-', '±', '~')):
+    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
-    text = text.replace(',', '').replace('.', '')
+    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
-    if text.count('/') == 1:
+    if text.count("/") == 1:
-        num, denom = text.split('/')
+        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    if text.lower() in _num_words:
@ -27,6 +59,4 @@ def like_num(text):
    return False
-LEX_ATTRS = {
+LEX_ATTRS = {LIKE_NUM: like_num}
    LIKE_NUM: like_num
 }
--- a/spacy/lang/en/morph_rules.py
+++ b/spacy/lang/en/morph_rules.py
@ -6,66 +6,321 @@ from ...symbols import LEMMA, PRON_LEMMA
 MORPH_RULES = {
    "PRP": {
-        "I":            {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Case": "Nom"},
+        "I": {
-        "me":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Case": "Acc"},
+            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Sing",
            "Case": "Nom",
        },
        "me": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Sing",
            "Case": "Acc",
        },
        "you": {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two"},
-        "he":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Masc", "Case": "Nom"},
+        "he": {
-        "him":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Masc", "Case": "Acc"},
+            LEMMA: PRON_LEMMA,
-        "she":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Fem",  "Case": "Nom"},
+            "PronType": "Prs",
-        "her":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Fem",  "Case": "Acc"},
+            "Person": "Three",
-        "it":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Neut"},
+            "Number": "Sing",
-        "we":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Case": "Nom"},
+            "Gender": "Masc",
-        "us":           {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Case": "Acc"},
+            "Case": "Nom",
-        "they":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Plur", "Case": "Nom"},
+        },
-        "them":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Plur", "Case": "Acc"},
+        "him": {
-
+            LEMMA: PRON_LEMMA,
-        "mine":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing", "Poss": "Yes", "Reflex": "Yes"},
+            "PronType": "Prs",
-        "his":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Masc", "Poss": "Yes", "Reflex": "Yes"},
+            "Person": "Three",
-        "hers":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Fem",  "Poss": "Yes", "Reflex": "Yes"},
+            "Number": "Sing",
-        "its":          {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Gender": "Neut", "Poss": "Yes", "Reflex": "Yes"},
+            "Gender": "Masc",
-        "ours":         {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Poss": "Yes", "Reflex": "Yes"},
+            "Case": "Acc",
-        "yours":        {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Number": "Plur", "Poss": "Yes", "Reflex": "Yes"},
+        },
-        "theirs":       {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Plur", "Poss": "Yes", "Reflex": "Yes"},
+        "she": {
-
+            LEMMA: PRON_LEMMA,
-        "myself":       {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Sing",  "Case": "Acc", "Reflex": "Yes"},
+            "PronType": "Prs",
-        "yourself":     {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Case": "Acc", "Reflex": "Yes"},
+            "Person": "Three",
-        "himself":      {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Acc", "Gender": "Masc", "Reflex": "Yes"},
+            "Number": "Sing",
-        "herself":      {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Acc", "Gender": "Fem",  "Reflex": "Yes"},
+            "Gender": "Fem",
-        "itself":       {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Acc", "Gender": "Neut", "Reflex": "Yes"},
+            "Case": "Nom",
-        "themself":     {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Sing", "Case": "Acc", "Reflex": "Yes"},
+        },
-        "ourselves":    {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "One", "Number": "Plur", "Case": "Acc", "Reflex": "Yes"},
+        "her": {
-        "yourselves":   {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Two", "Case": "Acc", "Reflex": "Yes"},
+            LEMMA: PRON_LEMMA,
-        "themselves":   {LEMMA: PRON_LEMMA, "PronType": "Prs", "Person": "Three", "Number": "Plur", "Case": "Acc", "Reflex": "Yes"}
+            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Fem",
            "Case": "Acc",
        },
        "it": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Neut",
        },
        "we": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Case": "Nom",
        },
        "us": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Case": "Acc",
        },
        "they": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Plur",
            "Case": "Nom",
        },
        "them": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Plur",
            "Case": "Acc",
        },
        "mine": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Sing",
            "Poss": "Yes",
            "Reflex": "Yes",
        },
        "his": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Masc",
            "Poss": "Yes",
            "Reflex": "Yes",
        },
        "hers": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Fem",
            "Poss": "Yes",
            "Reflex": "Yes",
        },
        "its": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Neut",
            "Poss": "Yes",
            "Reflex": "Yes",
        },
        "ours": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Poss": "Yes",
            "Reflex": "Yes",
        },
        "yours": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Number": "Plur",
            "Poss": "Yes",
            "Reflex": "Yes",
        },
        "theirs": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Plur",
            "Poss": "Yes",
            "Reflex": "Yes",
        },
        "myself": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Sing",
            "Case": "Acc",
            "Reflex": "Yes",
        },
        "yourself": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Case": "Acc",
            "Reflex": "Yes",
        },
        "himself": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Acc",
            "Gender": "Masc",
            "Reflex": "Yes",
        },
        "herself": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Acc",
            "Gender": "Fem",
            "Reflex": "Yes",
        },
        "itself": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Acc",
            "Gender": "Neut",
            "Reflex": "Yes",
        },
        "themself": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Sing",
            "Case": "Acc",
            "Reflex": "Yes",
        },
        "ourselves": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "One",
            "Number": "Plur",
            "Case": "Acc",
            "Reflex": "Yes",
        },
        "yourselves": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Two",
            "Case": "Acc",
            "Reflex": "Yes",
        },
        "themselves": {
            LEMMA: PRON_LEMMA,
            "PronType": "Prs",
            "Person": "Three",
            "Number": "Plur",
            "Case": "Acc",
            "Reflex": "Yes",
        },
    },
    "PRP$": {
-        "my":           {LEMMA: PRON_LEMMA, "Person": "One", "Number": "Sing", "PronType": "Prs", "Poss": "Yes"},
+        "my": {
            LEMMA: PRON_LEMMA,
            "Person": "One",
            "Number": "Sing",
            "PronType": "Prs",
            "Poss": "Yes",
        },
        "your": {LEMMA: PRON_LEMMA, "Person": "Two", "PronType": "Prs", "Poss": "Yes"},
-        "his":          {LEMMA: PRON_LEMMA, "Person": "Three", "Number": "Sing", "Gender": "Masc", "PronType": "Prs", "Poss": "Yes"},
+        "his": {
-        "her":          {LEMMA: PRON_LEMMA, "Person": "Three", "Number": "Sing", "Gender": "Fem",  "PronType": "Prs", "Poss": "Yes"},
+            LEMMA: PRON_LEMMA,
-        "its":          {LEMMA: PRON_LEMMA, "Person": "Three", "Number": "Sing", "Gender": "Neut", "PronType": "Prs", "Poss": "Yes"},
+            "Person": "Three",
-        "our":          {LEMMA: PRON_LEMMA, "Person": "One", "Number": "Plur", "PronType": "Prs", "Poss": "Yes"},
+            "Number": "Sing",
-        "their":        {LEMMA: PRON_LEMMA, "Person": "Three", "Number": "Plur", "PronType": "Prs", "Poss": "Yes"}
+            "Gender": "Masc",
            "PronType": "Prs",
            "Poss": "Yes",
        },
        "her": {
            LEMMA: PRON_LEMMA,
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Fem",
            "PronType": "Prs",
            "Poss": "Yes",
        },
        "its": {
            LEMMA: PRON_LEMMA,
            "Person": "Three",
            "Number": "Sing",
            "Gender": "Neut",
            "PronType": "Prs",
            "Poss": "Yes",
        },
        "our": {
            LEMMA: PRON_LEMMA,
            "Person": "One",
            "Number": "Plur",
            "PronType": "Prs",
            "Poss": "Yes",
        },
        "their": {
            LEMMA: PRON_LEMMA,
            "Person": "Three",
            "Number": "Plur",
            "PronType": "Prs",
            "Poss": "Yes",
        },
    },
    "VBZ": {
-        "am":           {LEMMA: "be", "VerbForm": "Fin", "Person": "One", "Tense": "Pres", "Mood": "Ind"},
+        "am": {
-        "are":          {LEMMA: "be", "VerbForm": "Fin", "Person": "Two", "Tense": "Pres", "Mood": "Ind"},
+            LEMMA: "be",
-        "is":           {LEMMA: "be", "VerbForm": "Fin", "Person": "Three", "Tense": "Pres", "Mood": "Ind"},
+            "VerbForm": "Fin",
-        "'re":          {LEMMA: "be", "VerbForm": "Fin", "Person": "Two", "Tense": "Pres", "Mood": "Ind"},
+            "Person": "One",
-        "'s":           {LEMMA: "be", "VerbForm": "Fin", "Person": "Three", "Tense": "Pres", "Mood": "Ind"},
+            "Tense": "Pres",
            "Mood": "Ind",
        },
        "are": {
            LEMMA: "be",
            "VerbForm": "Fin",
            "Person": "Two",
            "Tense": "Pres",
            "Mood": "Ind",
        },
        "is": {
            LEMMA: "be",
            "VerbForm": "Fin",
            "Person": "Three",
            "Tense": "Pres",
            "Mood": "Ind",
        },
        "'re": {
            LEMMA: "be",
            "VerbForm": "Fin",
            "Person": "Two",
            "Tense": "Pres",
            "Mood": "Ind",
        },
        "'s": {
            LEMMA: "be",
            "VerbForm": "Fin",
            "Person": "Three",
            "Tense": "Pres",
            "Mood": "Ind",
        },
    },
    "VBP": {
        "are": {LEMMA: "be", "VerbForm": "Fin", "Tense": "Pres", "Mood": "Ind"},
        "'re": {LEMMA: "be", "VerbForm": "Fin", "Tense": "Pres", "Mood": "Ind"},
-        "am":           {LEMMA: "be", "VerbForm": "Fin", "Person": "One", "Tense": "Pres", "Mood": "Ind"},
+        "am": {
            LEMMA: "be",
            "VerbForm": "Fin",
            "Person": "One",
            "Tense": "Pres",
            "Mood": "Ind",
        },
    },
    "VBD": {
        "was": {LEMMA: "be", "VerbForm": "Fin", "Tense": "Past", "Number": "Sing"},
-        "were":         {LEMMA: "be", "VerbForm": "Fin", "Tense": "Past", "Number": "Plur"}
+        "were": {LEMMA: "be", "VerbForm": "Fin", "Tense": "Past", "Number": "Plur"},
-    }
+    },
 }
--- a/spacy/lang/en/norm_exceptions.py
+++ b/spacy/lang/en/norm_exceptions.py
@ -12,7 +12,6 @@ _exc = {
    "plz": "please",
    "pls": "please",
    "thx": "thanks",
    # US vs. UK spelling
    "accessorise": "accessorize",
    "accessorised": "accessorized",
@ -1758,7 +1757,7 @@ _exc = {
    "yoghourt": "yogurt",
    "yoghourts": "yogurts",
    "yoghurt": "yogurt",
-    "yoghurts": "yogurts"
+    "yoghurts": "yogurts",
 }
--- a/spacy/lang/en/stop_words.py
+++ b/spacy/lang/en/stop_words.py
@ -3,8 +3,8 @@ from __future__ import unicode_literals
 # Stop words
-
+STOP_WORDS = set(
-STOP_WORDS = set("""
+    """
 a about above across after afterwards again against all almost alone along
 already also although always am among amongst amount an and another any anyhow
 anyone anything anyway anywhere are around as at
@ -68,4 +68,5 @@ whither who whoever whole whom whose why will with within without would
 yet you your yours yourself yourselves
 'd 'll 'm 're 's 've
-""".split())
+""".split()
 )
--- a/spacy/lang/en/syntax_iterators.py
+++ b/spacy/lang/en/syntax_iterators.py
@ -8,12 +8,21 @@ def noun_chunks(obj):
    """
    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
    """
-    labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj', 'dative', 'appos',
+    labels = [
-              'attr', 'ROOT']
+        "nsubj",
        "dobj",
        "nsubjpass",
        "pcomp",
        "pobj",
        "dative",
        "appos",
        "attr",
        "ROOT",
    ]
    doc = obj.doc  # Ensure works on both Doc and Span.
    np_deps = [doc.vocab.strings.add(label) for label in labels]
-    conj = doc.vocab.strings.add('conj')
+    conj = doc.vocab.strings.add("conj")
-    np_label = doc.vocab.strings.add('NP')
+    np_label = doc.vocab.strings.add("NP")
    seen = set()
    for i, word in enumerate(obj):
        if word.pos not in (NOUN, PROPN, PRON):
@ -38,6 +47,4 @@ def noun_chunks(obj):
                yield word.left_edge.i, word.i + 1, np_label
-SYNTAX_ITERATORS = {
+SYNTAX_ITERATORS = {"noun_chunks": noun_chunks}
    'noun_chunks': noun_chunks
 }
--- a/spacy/lang/en/tag_map.py
+++ b/spacy/lang/en/tag_map.py
@ -11,7 +11,7 @@ TAG_MAP = {
    "-LRB-": {POS: PUNCT, "PunctType": "brck", "PunctSide": "ini"},
    "-RRB-": {POS: PUNCT, "PunctType": "brck", "PunctSide": "fin"},
    "``": {POS: PUNCT, "PunctType": "quot", "PunctSide": "ini"},
-    "\"\"":     {POS: PUNCT, "PunctType": "quot", "PunctSide": "fin"},
+    '""': {POS: PUNCT, "PunctType": "quot", "PunctSide": "fin"},
    "''": {POS: PUNCT, "PunctType": "quot", "PunctSide": "fin"},
    ":": {POS: PUNCT},
    "$": {POS: SYM, "Other": {"SymType": "currency"}},
@ -51,7 +51,13 @@ TAG_MAP = {
    "VBG": {POS: VERB, "VerbForm": "part", "Tense": "pres", "Aspect": "prog"},
    "VBN": {POS: VERB, "VerbForm": "part", "Tense": "past", "Aspect": "perf"},
    "VBP": {POS: VERB, "VerbForm": "fin", "Tense": "pres"},
-    "VBZ":      {POS: VERB, "VerbForm": "fin", "Tense": "pres", "Number": "sing", "Person": 3},
+    "VBZ": {
        POS: VERB,
        "VerbForm": "fin",
        "Tense": "pres",
        "Number": "sing",
        "Person": 3,
    },
    "WDT": {POS: ADJ, "PronType": "int|rel"},
    "WP": {POS: NOUN, "PronType": "int|rel"},
    "WP$": {POS: ADJ, "Poss": "yes", "PronType": "int|rel"},
--- a/spacy/lang/en/tokenizer_exceptions.py
+++ b/spacy/lang/en/tokenizer_exceptions.py
@ -5,103 +5,143 @@ from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
 _exc = {}
-_exclude = ["Ill", "ill", "Its", "its", "Hell", "hell", "Shell", "shell",
+_exclude = [
-               "Shed", "shed", "were", "Were", "Well", "well", "Whore", "whore"]
+    "Ill",
    "ill",
    "Its",
    "its",
    "Hell",
    "hell",
    "Shell",
    "shell",
    "Shed",
    "shed",
    "were",
    "Were",
    "Well",
    "well",
    "Whore",
    "whore",
 ]
 # Pronouns
 for pron in ["i"]:
    for orth in [pron, pron.title()]:
        _exc[orth + "'m"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "'m", LEMMA: "be", NORM: "am", TAG: "VBP", "tenspect": 1, "number": 1}]
+            {
                ORTH: "'m",
                LEMMA: "be",
                NORM: "am",
                TAG: "VBP",
                "tenspect": 1,
                "number": 1,
            },
        ]
        _exc[orth + "m"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "m", LEMMA: "be", TAG: "VBP", "tenspect": 1, "number": 1 }]
+            {ORTH: "m", LEMMA: "be", TAG: "VBP", "tenspect": 1, "number": 1},
        ]
        _exc[orth + "'ma"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
            {ORTH: "'m", LEMMA: "be", NORM: "am"},
-            {ORTH: "a", LEMMA: "going to", NORM: "gonna"}]
+            {ORTH: "a", LEMMA: "going to", NORM: "gonna"},
        ]
        _exc[orth + "ma"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
            {ORTH: "m", LEMMA: "be", NORM: "am"},
-            {ORTH: "a", LEMMA: "going to", NORM: "gonna"}]
+            {ORTH: "a", LEMMA: "going to", NORM: "gonna"},
        ]
 for pron in ["i", "you", "he", "she", "it", "we", "they"]:
    for orth in [pron, pron.title()]:
        _exc[orth + "'ll"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "'ll", LEMMA: "will", NORM: "will", TAG: "MD"}]
+            {ORTH: "'ll", LEMMA: "will", NORM: "will", TAG: "MD"},
        ]
        _exc[orth + "ll"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "ll", LEMMA: "will", NORM: "will", TAG: "MD"}]
+            {ORTH: "ll", LEMMA: "will", NORM: "will", TAG: "MD"},
        ]
        _exc[orth + "'ll've"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
            {ORTH: "'ll", LEMMA: "will", NORM: "will", TAG: "MD"},
-            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "llve"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
            {ORTH: "ll", LEMMA: "will", NORM: "will", TAG: "MD"},
-            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "'d"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "'d", LEMMA: "would", NORM: "would", TAG: "MD"}]
+            {ORTH: "'d", LEMMA: "would", NORM: "would", TAG: "MD"},
        ]
        _exc[orth + "d"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "d", LEMMA: "would", NORM: "would", TAG: "MD"}]
+            {ORTH: "d", LEMMA: "would", NORM: "would", TAG: "MD"},
        ]
        _exc[orth + "'d've"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
            {ORTH: "'d", LEMMA: "would", NORM: "would", TAG: "MD"},
-            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "dve"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
            {ORTH: "d", LEMMA: "would", NORM: "would", TAG: "MD"},
-            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
 for pron in ["i", "you", "we", "they"]:
    for orth in [pron, pron.title()]:
        _exc[orth + "'ve"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "ve"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
 for pron in ["you", "we", "they"]:
    for orth in [pron, pron.title()]:
        _exc[orth + "'re"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "'re", LEMMA: "be", NORM: "are"}]
+            {ORTH: "'re", LEMMA: "be", NORM: "are"},
        ]
        _exc[orth + "re"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "re", LEMMA: "be", NORM: "are", TAG: "VBZ"}]
+            {ORTH: "re", LEMMA: "be", NORM: "are", TAG: "VBZ"},
        ]
 for pron in ["he", "she", "it"]:
    for orth in [pron, pron.title()]:
        _exc[orth + "'s"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "'s", NORM: "'s"}]
+            {ORTH: "'s", NORM: "'s"},
        ]
        _exc[orth + "s"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "s"}]
+            {ORTH: "s"},
        ]
 # W-words, relative pronouns, prepositions etc.
@ -110,63 +150,71 @@ for word in ["who", "what", "when", "where", "why", "how", "there", "that"]:
    for orth in [word, word.title()]:
        _exc[orth + "'s"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
-            {ORTH: "'s", NORM: "'s"}]
+            {ORTH: "'s", NORM: "'s"},
        ]
-        _exc[orth + "s"] = [
+        _exc[orth + "s"] = [{ORTH: orth, LEMMA: word, NORM: word}, {ORTH: "s"}]
            {ORTH: orth, LEMMA: word, NORM: word},
            {ORTH: "s"}]
        _exc[orth + "'ll"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
-            {ORTH: "'ll", LEMMA: "will", NORM: "will", TAG: "MD"}]
+            {ORTH: "'ll", LEMMA: "will", NORM: "will", TAG: "MD"},
        ]
        _exc[orth + "ll"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
-            {ORTH: "ll", LEMMA: "will", NORM: "will", TAG: "MD"}]
+            {ORTH: "ll", LEMMA: "will", NORM: "will", TAG: "MD"},
        ]
        _exc[orth + "'ll've"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
            {ORTH: "'ll", LEMMA: "will", NORM: "will", TAG: "MD"},
-            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "llve"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
            {ORTH: "ll", LEMMA: "will", NORM: "will", TAG: "MD"},
-            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "'re"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
-            {ORTH: "'re", LEMMA: "be", NORM: "are"}]
+            {ORTH: "'re", LEMMA: "be", NORM: "are"},
        ]
        _exc[orth + "re"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
-            {ORTH: "re", LEMMA: "be", NORM: "are"}]
+            {ORTH: "re", LEMMA: "be", NORM: "are"},
        ]
        _exc[orth + "'ve"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
-            {ORTH: "'ve", LEMMA: "have", TAG: "VB"}]
+            {ORTH: "'ve", LEMMA: "have", TAG: "VB"},
        ]
        _exc[orth + "ve"] = [
            {ORTH: orth, LEMMA: word},
-            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "'d"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
-            {ORTH: "'d", NORM: "'d"}]
+            {ORTH: "'d", NORM: "'d"},
        ]
-        _exc[orth + "d"] = [
+        _exc[orth + "d"] = [{ORTH: orth, LEMMA: word, NORM: word}, {ORTH: "d"}]
            {ORTH: orth, LEMMA: word, NORM: word},
            {ORTH: "d"}]
        _exc[orth + "'d've"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
            {ORTH: "'d", LEMMA: "would", NORM: "would", TAG: "MD"},
-            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[orth + "dve"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
            {ORTH: "d", LEMMA: "would", NORM: "would", TAG: "MD"},
-            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
 # Verbs
@ -186,27 +234,32 @@ for verb_data in [
    {ORTH: "sha", LEMMA: "shall", NORM: "shall", TAG: "MD"},
    {ORTH: "should", NORM: "should", TAG: "MD"},
    {ORTH: "wo", LEMMA: "will", NORM: "will", TAG: "MD"},
-    {ORTH: "would", NORM: "would", TAG: "MD"}]:
+    {ORTH: "would", NORM: "would", TAG: "MD"},
 ]:
    verb_data_tc = dict(verb_data)
    verb_data_tc[ORTH] = verb_data_tc[ORTH].title()
    for data in [verb_data, verb_data_tc]:
        _exc[data[ORTH] + "n't"] = [
            dict(data),
-            {ORTH: "n't", LEMMA: "not", NORM: "not", TAG: "RB"}]
+            {ORTH: "n't", LEMMA: "not", NORM: "not", TAG: "RB"},
        ]
        _exc[data[ORTH] + "nt"] = [
            dict(data),
-            {ORTH: "nt", LEMMA: "not", NORM: "not", TAG: "RB"}]
+            {ORTH: "nt", LEMMA: "not", NORM: "not", TAG: "RB"},
        ]
        _exc[data[ORTH] + "n't've"] = [
            dict(data),
            {ORTH: "n't", LEMMA: "not", NORM: "not", TAG: "RB"},
-            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
        _exc[data[ORTH] + "ntve"] = [
            dict(data),
            {ORTH: "nt", LEMMA: "not", NORM: "not", TAG: "RB"},
-            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}]
+            {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
        ]
 for verb_data in [
@ -214,17 +267,14 @@ for verb_data in [
    {ORTH: "might", NORM: "might", TAG: "MD"},
    {ORTH: "must", NORM: "must", TAG: "MD"},
    {ORTH: "should", NORM: "should", TAG: "MD"},
-    {ORTH: "would", NORM: "would", TAG: "MD"}]:
+    {ORTH: "would", NORM: "would", TAG: "MD"},
 ]:
    verb_data_tc = dict(verb_data)
    verb_data_tc[ORTH] = verb_data_tc[ORTH].title()
    for data in [verb_data, verb_data_tc]:
-        _exc[data[ORTH] + "'ve"] = [
+        _exc[data[ORTH] + "'ve"] = [dict(data), {ORTH: "'ve", LEMMA: "have", TAG: "VB"}]
            dict(data),
            {ORTH: "'ve", LEMMA: "have", TAG: "VB"}]
-        _exc[data[ORTH] + "ve"] = [
+        _exc[data[ORTH] + "ve"] = [dict(data), {ORTH: "ve", LEMMA: "have", TAG: "VB"}]
            dict(data),
            {ORTH: "ve", LEMMA: "have", TAG: "VB"}]
 for verb_data in [
@ -235,17 +285,20 @@ for verb_data in [
    {ORTH: "were", LEMMA: "be", NORM: "were"},
    {ORTH: "have", NORM: "have"},
    {ORTH: "has", LEMMA: "have", NORM: "has"},
-    {ORTH: "dare", NORM: "dare"}]:
+    {ORTH: "dare", NORM: "dare"},
 ]:
    verb_data_tc = dict(verb_data)
    verb_data_tc[ORTH] = verb_data_tc[ORTH].title()
    for data in [verb_data, verb_data_tc]:
        _exc[data[ORTH] + "n't"] = [
            dict(data),
-            {ORTH: "n't", LEMMA: "not", NORM: "not", TAG: "RB"}]
+            {ORTH: "n't", LEMMA: "not", NORM: "not", TAG: "RB"},
        ]
        _exc[data[ORTH] + "nt"] = [
            dict(data),
-            {ORTH: "nt", LEMMA: "not", NORM: "not", TAG: "RB"}]
+            {ORTH: "nt", LEMMA: "not", NORM: "not", TAG: "RB"},
        ]
 # Other contractions with trailing apostrophe
@ -256,7 +309,8 @@ for exc_data in [
    {ORTH: "nothin", LEMMA: "nothing", NORM: "nothing"},
    {ORTH: "nuthin", LEMMA: "nothing", NORM: "nothing"},
    {ORTH: "ol", LEMMA: "old", NORM: "old"},
-    {ORTH: "somethin", LEMMA: "something", NORM: "something"}]:
+    {ORTH: "somethin", LEMMA: "something", NORM: "something"},
 ]:
    exc_data_tc = dict(exc_data)
    exc_data_tc[ORTH] = exc_data_tc[ORTH].title()
    for data in [exc_data, exc_data_tc]:
@ -272,7 +326,8 @@ for exc_data in [
    {ORTH: "cause", LEMMA: "because", NORM: "because"},
    {ORTH: "em", LEMMA: PRON_LEMMA, NORM: "them"},
    {ORTH: "ll", LEMMA: "will", NORM: "will"},
-    {ORTH: "nuff", LEMMA: "enough", NORM: "enough"}]:
+    {ORTH: "nuff", LEMMA: "enough", NORM: "enough"},
 ]:
    exc_data_apos = dict(exc_data)
    exc_data_apos[ORTH] = "'" + exc_data_apos[ORTH]
    for data in [exc_data, exc_data_apos]:
@ -285,81 +340,69 @@ for h in range(1, 12 + 1):
    for period in ["a.m.", "am"]:
        _exc["%d%s" % (h, period)] = [
            {ORTH: "%d" % h},
-            {ORTH: period, LEMMA: "a.m.", NORM: "a.m."}]
+            {ORTH: period, LEMMA: "a.m.", NORM: "a.m."},
        ]
    for period in ["p.m.", "pm"]:
        _exc["%d%s" % (h, period)] = [
            {ORTH: "%d" % h},
-            {ORTH: period, LEMMA: "p.m.", NORM: "p.m."}]
+            {ORTH: period, LEMMA: "p.m.", NORM: "p.m."},
        ]
 # Rest
 _other_exc = {
-    "y'all": [
+    "y'all": [{ORTH: "y'", LEMMA: PRON_LEMMA, NORM: "you"}, {ORTH: "all"}],
-        {ORTH: "y'", LEMMA: PRON_LEMMA, NORM: "you"},
+    "yall": [{ORTH: "y", LEMMA: PRON_LEMMA, NORM: "you"}, {ORTH: "all"}],
        {ORTH: "all"}],
    "yall": [
        {ORTH: "y", LEMMA: PRON_LEMMA, NORM: "you"},
        {ORTH: "all"}],
    "how'd'y": [
        {ORTH: "how", LEMMA: "how"},
        {ORTH: "'d", LEMMA: "do"},
-        {ORTH: "'y", LEMMA: PRON_LEMMA, NORM: "you"}],
+        {ORTH: "'y", LEMMA: PRON_LEMMA, NORM: "you"},
-
+    ],
    "How'd'y": [
        {ORTH: "How", LEMMA: "how", NORM: "how"},
        {ORTH: "'d", LEMMA: "do"},
-        {ORTH: "'y", LEMMA: PRON_LEMMA, NORM: "you"}],
+        {ORTH: "'y", LEMMA: PRON_LEMMA, NORM: "you"},
-
+    ],
    "not've": [
        {ORTH: "not", LEMMA: "not", TAG: "RB"},
-        {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}],
+        {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
-
+    ],
    "notve": [
        {ORTH: "not", LEMMA: "not", TAG: "RB"},
-        {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}],
+        {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
-
+    ],
    "Not've": [
        {ORTH: "Not", LEMMA: "not", NORM: "not", TAG: "RB"},
-        {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"}],
+        {ORTH: "'ve", LEMMA: "have", NORM: "have", TAG: "VB"},
-
+    ],
    "Notve": [
        {ORTH: "Not", LEMMA: "not", NORM: "not", TAG: "RB"},
-        {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"}],
+        {ORTH: "ve", LEMMA: "have", NORM: "have", TAG: "VB"},
-
+    ],
    "cannot": [
        {ORTH: "can", LEMMA: "can", TAG: "MD"},
-        {ORTH: "not", LEMMA: "not", TAG: "RB"}],
+        {ORTH: "not", LEMMA: "not", TAG: "RB"},
-
+    ],
    "Cannot": [
        {ORTH: "Can", LEMMA: "can", NORM: "can", TAG: "MD"},
-        {ORTH: "not", LEMMA: "not", TAG: "RB"}],
+        {ORTH: "not", LEMMA: "not", TAG: "RB"},
-
+    ],
    "gonna": [
        {ORTH: "gon", LEMMA: "go", NORM: "going"},
-        {ORTH: "na", LEMMA: "to", NORM: "to"}],
+        {ORTH: "na", LEMMA: "to", NORM: "to"},
-
+    ],
    "Gonna": [
        {ORTH: "Gon", LEMMA: "go", NORM: "going"},
-        {ORTH: "na", LEMMA: "to", NORM: "to"}],
+        {ORTH: "na", LEMMA: "to", NORM: "to"},
-
+    ],
-    "gotta": [
+    "gotta": [{ORTH: "got"}, {ORTH: "ta", LEMMA: "to", NORM: "to"}],
-        {ORTH: "got"},
+    "Gotta": [{ORTH: "Got", NORM: "got"}, {ORTH: "ta", LEMMA: "to", NORM: "to"}],
-        {ORTH: "ta", LEMMA: "to", NORM: "to"}],
+    "let's": [{ORTH: "let"}, {ORTH: "'s", LEMMA: PRON_LEMMA, NORM: "us"}],
    "Gotta": [
        {ORTH: "Got", NORM: "got"},
        {ORTH: "ta", LEMMA: "to", NORM: "to"}],
    "let's": [
        {ORTH: "let"},
        {ORTH: "'s", LEMMA: PRON_LEMMA, NORM: "us"}],
    "Let's": [
        {ORTH: "Let", LEMMA: "let", NORM: "let"},
-        {ORTH: "'s", LEMMA: PRON_LEMMA, NORM: "us"}]
+        {ORTH: "'s", LEMMA: PRON_LEMMA, NORM: "us"},
    ],
 }
 _exc.update(_other_exc)
@ -402,8 +445,6 @@ for exc_data in [
    {ORTH: "Goin'", LEMMA: "go", NORM: "going"},
    {ORTH: "goin", LEMMA: "go", NORM: "going"},
    {ORTH: "Goin", LEMMA: "go", NORM: "going"},
    {ORTH: "Mt.", LEMMA: "Mount", NORM: "Mount"},
    {ORTH: "Ak.", LEMMA: "Alaska", NORM: "Alaska"},
    {ORTH: "Ala.", LEMMA: "Alabama", NORM: "Alabama"},
@ -456,15 +497,47 @@ for exc_data in [
    {ORTH: "Tenn.", LEMMA: "Tennessee", NORM: "Tennessee"},
    {ORTH: "Va.", LEMMA: "Virginia", NORM: "Virginia"},
    {ORTH: "Wash.", LEMMA: "Washington", NORM: "Washington"},
-    {ORTH: "Wis.", LEMMA: "Wisconsin", NORM: "Wisconsin"}]:
+    {ORTH: "Wis.", LEMMA: "Wisconsin", NORM: "Wisconsin"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
 for orth in [
-    "'d", "a.m.", "Adm.", "Bros.", "co.", "Co.", "Corp.", "D.C.", "Dr.", "e.g.",
+    "'d",
-    "E.g.", "E.G.", "Gen.", "Gov.", "i.e.", "I.e.", "I.E.", "Inc.", "Jr.",
+    "a.m.",
-    "Ltd.", "Md.", "Messrs.", "Mo.", "Mont.", "Mr.", "Mrs.", "Ms.", "p.m.",
+    "Adm.",
-    "Ph.D.", "Rep.", "Rev.", "Sen.", "St.", "vs."]:
+    "Bros.",
    "co.",
    "Co.",
    "Corp.",
    "D.C.",
    "Dr.",
    "e.g.",
    "E.g.",
    "E.G.",
    "Gen.",
    "Gov.",
    "i.e.",
    "I.e.",
    "I.E.",
    "Inc.",
    "Jr.",
    "Ltd.",
    "Md.",
    "Messrs.",
    "Mo.",
    "Mont.",
    "Mr.",
    "Mrs.",
    "Ms.",
    "p.m.",
    "Ph.D.",
    "Rep.",
    "Rev.",
    "Sen.",
    "St.",
    "vs.",
 ]:
    _exc[orth] = [{ORTH: orth}]
--- a/spacy/lang/entity_rules.py
+++ b/spacy/lang/entity_rules.py
@ -30,8 +30,9 @@ for name, tag, patterns in [
    ("Facebook", "ORG", [[{LOWER: "facebook"}]]),
    ("Blizzard", "ORG", [[{LOWER: "blizzard"}]]),
    ("Ubuntu", "ORG", [[{LOWER: "ubuntu"}]]),
-    ("YouTube", "PRODUCT", [[{LOWER: "youtube"}]]),]:
+    ("YouTube", "PRODUCT", [[{LOWER: "youtube"}]]),
-    ENTITY_RULES.append({ENT_ID: name, 'attrs': {ENT_TYPE: tag}, 'patterns': patterns})
+]:
    ENTITY_RULES.append({ENT_ID: name, "attrs": {ENT_TYPE: tag}, "patterns": patterns})
 FALSE_POSITIVES = [
@ -46,5 +47,5 @@ FALSE_POSITIVES = [
    [{ORTH: "Yay"}],
    [{ORTH: "Ahh"}],
    [{ORTH: "Yea"}],
-    [{ORTH: "Bah"}]
+    [{ORTH: "Bah"}],
 ]
--- a/spacy/lang/es/init.py
+++ b/spacy/lang/es/init.py
@ -16,8 +16,10 @@ from ...util import update_exc, add_lookups
 class SpanishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: 'es'
+    lex_attr_getters[LANG] = lambda text: "es"
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
+    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
@ -26,8 +28,8 @@ class SpanishDefaults(Language.Defaults):
 class Spanish(Language):
-    lang = 'es'
+    lang = "es"
    Defaults = SpanishDefaults
-__all__ = ['Spanish']
+__all__ = ["Spanish"]
--- a/spacy/lang/es/examples.py
+++ b/spacy/lang/es/examples.py
@ -18,5 +18,5 @@ sentences = [
    "El gato come pescado",
    "Veo al hombre con el telescopio",
    "La araña come moscas",
-    "El pingüino incuba en su nido"
+    "El pingüino incuba en su nido",
 ]
--- a/spacy/lang/es/stop_words.py
+++ b/spacy/lang/es/stop_words.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-STOP_WORDS = set("""
+STOP_WORDS = set(
    """
 actualmente acuerdo adelante ademas además adrede afirmó agregó ahi ahora ahí
 al algo alguna algunas alguno algunos algún alli allí alrededor ambos ampleamos
 antano antaño ante anterior antes apenas aproximadamente aquel aquella aquellas
@ -81,4 +82,5 @@ va vais valor vamos van varias varios vaya veces ver verdad verdadera verdadero
 vez vosotras vosotros voy vuestra vuestras vuestro vuestros
 ya yo
-""".split())
+""".split()
 )
--- a/spacy/lang/es/syntax_iterators.py
+++ b/spacy/lang/es/syntax_iterators.py
@ -8,17 +8,19 @@ def noun_chunks(obj):
    doc = obj.doc
    if not len(doc):
        return
-    np_label = doc.vocab.strings.add('NP')
+    np_label = doc.vocab.strings.add("NP")
-    left_labels = ['det', 'fixed', 'neg'] #['nunmod', 'det', 'appos', 'fixed']
+    left_labels = ["det", "fixed", "neg"]  # ['nunmod', 'det', 'appos', 'fixed']
-    right_labels = ['flat', 'fixed', 'compound', 'neg']
+    right_labels = ["flat", "fixed", "compound", "neg"]
-    stop_labels = ['punct']
+    stop_labels = ["punct"]
    np_left_deps = [doc.vocab.strings.add(label) for label in left_labels]
    np_right_deps = [doc.vocab.strings.add(label) for label in right_labels]
    stop_deps = [doc.vocab.strings.add(label) for label in stop_labels]
    token = doc[0]
    while token and token.i < len(doc):
        if token.pos in [PROPN, NOUN, PRON]:
-            left, right = noun_bounds(doc, token, np_left_deps, np_right_deps, stop_deps)
+            left, right = noun_bounds(
                doc, token, np_left_deps, np_right_deps, stop_deps
            )
            yield left.i, right.i + 1, np_label
            token = right
        token = next_token(token)
@ -31,7 +33,7 @@ def is_verb_token(token):
 def next_token(token):
    try:
        return token.nbor()
-    except:
+    except IndexError:
        return None
@ -42,16 +44,20 @@ def noun_bounds(doc, root, np_left_deps, np_right_deps, stop_deps):
            left_bound = token
    right_bound = root
    for token in root.rights:
-        if (token.dep in np_right_deps):
+        if token.dep in np_right_deps:
-            left, right = noun_bounds(doc, token, np_left_deps, np_right_deps, stop_deps)
+            left, right = noun_bounds(
-            if list(filter(lambda t: is_verb_token(t) or t.dep in stop_deps,
+                doc, token, np_left_deps, np_right_deps, stop_deps
-                           doc[left_bound.i: right.i])):
+            )
            if list(
                filter(
                    lambda t: is_verb_token(t) or t.dep in stop_deps,
                    doc[left_bound.i : right.i],
                )
            ):
                break
            else:
                right_bound = right
    return left_bound, right_bound
-SYNTAX_ITERATORS = {
+SYNTAX_ITERATORS = {"noun_chunks": noun_chunks}
    'noun_chunks': noun_chunks
 }
--- a/spacy/lang/es/tag_map.py
+++ b/spacy/lang/es/tag_map.py
@ -4,7 +4,7 @@ from __future__ import unicode_literals
 from ...symbols import POS, PUNCT, SYM, ADJ, NUM, DET, ADV, ADP, X, VERB
 from ...symbols import NOUN, PROPN, PART, INTJ, SPACE, PRON, SCONJ, AUX, CONJ
-
+# fmt: off
 TAG_MAP = {
    "ADJ___": {"morph": "_", POS: ADJ},
    "ADJ__AdpType=Prep": {"morph": "AdpType=Prep", POS: ADJ},
@ -307,3 +307,4 @@ TAG_MAP = {
    "X___": {"morph": "_", POS: X},
    "_SP": {"morph": "_", POS: SPACE},
 }
 # fmt: on
--- a/spacy/lang/es/tokenizer_exceptions.py
+++ b/spacy/lang/es/tokenizer_exceptions.py
@ -1,17 +1,12 @@
 # coding: utf8
 from __future__ import unicode_literals
-from ...symbols import ORTH, LEMMA, TAG, NORM, ADP, DET, PRON_LEMMA
+from ...symbols import ORTH, LEMMA, NORM, PRON_LEMMA
 _exc = {
-    "pal": [
+    "pal": [{ORTH: "pa", LEMMA: "para"}, {ORTH: "l", LEMMA: "el", NORM: "el"}],
-        {ORTH: "pa", LEMMA: "para"},
+    "pala": [{ORTH: "pa", LEMMA: "para"}, {ORTH: "la", LEMMA: "la", NORM: "la"}],
        {ORTH: "l", LEMMA: "el", NORM: "el"}],
    "pala": [
        {ORTH: "pa", LEMMA: "para"},
        {ORTH: "la", LEMMA: "la", NORM: "la"}]
 }
@ -24,32 +19,50 @@ for exc_data in [
    {ORTH: "Ud.", LEMMA: PRON_LEMMA, NORM: "usted"},
    {ORTH: "Vd.", LEMMA: PRON_LEMMA, NORM: "usted"},
    {ORTH: "Uds.", LEMMA: PRON_LEMMA, NORM: "ustedes"},
-    {ORTH: "Vds.", LEMMA: PRON_LEMMA, NORM: "ustedes"}]:
+    {ORTH: "Vds.", LEMMA: PRON_LEMMA, NORM: "ustedes"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
 # Times
-_exc["12m."] = [
+_exc["12m."] = [{ORTH: "12"}, {ORTH: "m.", LEMMA: "p.m."}]
    {ORTH: "12"},
    {ORTH: "m.", LEMMA: "p.m."}]
 for h in range(1, 12 + 1):
    for period in ["a.m.", "am"]:
-        _exc["%d%s" % (h, period)] = [
+        _exc["%d%s" % (h, period)] = [{ORTH: "%d" % h}, {ORTH: period, LEMMA: "a.m."}]
            {ORTH: "%d" % h},
            {ORTH: period, LEMMA: "a.m."}]
    for period in ["p.m.", "pm"]:
-        _exc["%d%s" % (h, period)] = [
+        _exc["%d%s" % (h, period)] = [{ORTH: "%d" % h}, {ORTH: period, LEMMA: "p.m."}]
            {ORTH: "%d" % h},
            {ORTH: period, LEMMA: "p.m."}]
 for orth in [
-    "a.C.", "a.J.C.", "apdo.", "Av.", "Avda.", "Cía.", "etc.", "Gob.", "Gral.",
+    "a.C.",
-    "Ing.", "J.C.", "Lic.", "m.n.", "no.", "núm.", "P.D.", "Prof.", "Profa.",
+    "a.J.C.",
-    "q.e.p.d.", "S.A.", "S.L.", "s.s.s.", "Sr.", "Sra.", "Srta."]:
+    "apdo.",
    "Av.",
    "Avda.",
    "Cía.",
    "etc.",
    "Gob.",
    "Gral.",
    "Ing.",
    "J.C.",
    "Lic.",
    "m.n.",
    "no.",
    "núm.",
    "P.D.",
    "Prof.",
    "Profa.",
    "q.e.p.d.",
    "S.A.",
    "S.L.",
    "s.s.s.",
    "Sr.",
    "Sra.",
    "Srta.",
 ]:
    _exc[orth] = [{ORTH: orth}]
--- a/spacy/lang/fa/init.py
+++ b/spacy/lang/fa/init.py
@ -12,11 +12,14 @@ from .tag_map import TAG_MAP
 from .punctuation import TOKENIZER_SUFFIXES
 from .lemmatizer import LEMMA_RULES, LEMMA_INDEX, LEMMA_EXC
 class PersianDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
+    lex_attr_getters[NORM] = add_lookups(
-    lex_attr_getters[LANG] = lambda text: 'fa'
+        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    lex_attr_getters[LANG] = lambda text: "fa"
    tokenizer_exceptions = update_exc(TOKENIZER_EXCEPTIONS)
    lemma_rules = LEMMA_RULES
    lemma_index = LEMMA_INDEX
@ -27,8 +30,8 @@ class PersianDefaults(Language.Defaults):
 class Persian(Language):
-    lang = 'fa'
+    lang = "fa"
    Defaults = PersianDefaults
-__all__ = ['Persian']
+__all__ = ["Persian"]
--- a/spacy/lang/fa/examples.py
+++ b/spacy/lang/fa/examples.py
@ -12,8 +12,8 @@ Example sentences to test spaCy and its language models.
 sentences = [
    "این یک جمله نمونه می باشد.",
-    "قرار ما، امروز ساعت ۲:۳۰ بعدازظهر هست!"
+    "قرار ما، امروز ساعت ۲:۳۰ بعدازظهر هست!",
    "دیروز علی به من ۲۰۰۰.۱﷼ پول نقد داد.",
-    "چطور می‌توان از تهران به کاشان رفت؟"
+    "چطور می‌توان از تهران به کاشان رفت؟",
-    "حدود ۸۰٪ هوا از نیتروژن تشکیل شده است."
+    "حدود ۸۰٪ هوا از نیتروژن تشکیل شده است.",
 ]
--- a/spacy/lang/fa/lemmatizer/init.py
+++ b/spacy/lang/fa/lemmatizer/init.py
@ -10,23 +10,13 @@ from ._verbs_exc import VERBS_EXC
 from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES, PUNCT_RULES
-LEMMA_INDEX = {
+LEMMA_INDEX = {"adj": ADJECTIVES, "noun": NOUNS, "verb": VERBS}
    'adj': ADJECTIVES,
    'noun': NOUNS, 
    'verb': VERBS
 }
 LEMMA_RULES = {
-    'adj': ADJECTIVE_RULES, 
+    "adj": ADJECTIVE_RULES,
-    'noun': NOUN_RULES, 
+    "noun": NOUN_RULES,
-    'verb': VERB_RULES,
+    "verb": VERB_RULES,
-    'punct': PUNCT_RULES
+    "punct": PUNCT_RULES,
 }
-LEMMA_EXC = {
+LEMMA_EXC = {"adj": ADJECTIVES_EXC, "noun": NOUNS_EXC, "verb": VERBS_EXC}
    'adj': ADJECTIVES_EXC, 
    'noun': NOUNS_EXC,
    'verb': VERBS_EXC
 }
--- a/spacy/lang/fa/lemmatizer/_adjectives.py
+++ b/spacy/lang/fa/lemmatizer/_adjectives.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals
-ADJECTIVES = set("""
+ADJECTIVES = set(
    """
 صفر
 صفرم
 صفر‌ام
@ -2977,4 +2978,7 @@ ADJECTIVES = set("""
 یک‌دست
 یک‌روزه
 ییلاقی
-""".split('\n'))
+""".split(
        "\n"
    )
 )
--- a/Show More
+++ b/Show More