spaCy/examples/keras_parikh_entailment/spacy_hook.py

import numpy as np
from keras.models import model_from_json

try:
    import cPickle as pickle
except ImportError:
    import pickle


class KerasSimilarityShim(object):
    entailment_types = ["entailment", "contradiction", "neutral"]

    @classmethod
    def load(cls, path, nlp, max_length=100, get_features=None):

        if get_features is None:
            get_features = get_word_ids

        with (path / "config.json").open() as file_:
            model = model_from_json(file_.read())
        with (path / "model").open("rb") as file_:
            weights = pickle.load(file_)

        embeddings = get_embeddings(nlp.vocab)
        weights.insert(1, embeddings)
        model.set_weights(weights)

        return cls(model, get_features=get_features, max_length=max_length)

    def __init__(self, model, get_features=None, max_length=100):
        self.model = model
        self.get_features = get_features
        self.max_length = max_length

    def __call__(self, doc):
        doc.user_hooks["similarity"] = self.predict
        doc.user_span_hooks["similarity"] = self.predict

        return doc

    def predict(self, doc1, doc2):
        x1 = self.get_features([doc1], max_length=self.max_length)
        x2 = self.get_features([doc2], max_length=self.max_length)
        scores = self.model.predict([x1, x2])

        return self.entailment_types[scores.argmax()], scores.max()


def get_embeddings(vocab, nr_unk=100):
    # the extra +1 is for a zero vector representing sentence-final padding
    num_vectors = max(lex.rank for lex in vocab) + 2

    # create random vectors for OOV tokens
    oov = np.random.normal(size=(nr_unk, vocab.vectors_length))
    oov = oov / oov.sum(axis=1, keepdims=True)

    vectors = np.zeros((num_vectors + nr_unk, vocab.vectors_length), dtype="float32")
    vectors[1 : (nr_unk + 1),] = oov
    for lex in vocab:
        if lex.has_vector and lex.vector_norm > 0:
            vectors[nr_unk + lex.rank + 1] = lex.vector / lex.vector_norm

    return vectors


def get_word_ids(docs, max_length=100, nr_unk=100):
    Xs = np.zeros((len(docs), max_length), dtype="int32")

    for i, doc in enumerate(docs):
        for j, token in enumerate(doc):
            if j == max_length:
                break
            if token.has_vector:
                Xs[i, j] = token.rank + nr_unk + 1
            else:
                Xs[i, j] = token.rank % nr_unk + 1
    return Xs
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`import numpy as np`
Rename entailment example 2016-11-01 03:51:54 +03:00			`from keras.models import model_from_json`
Fix x keras deep learning example 2017-01-31 22:27:13 +03:00
			`try:`
			`import cPickle as pickle`
			`except ImportError:`
			`import pickle`
Rename entailment example 2016-11-01 03:51:54 +03:00

			`class KerasSimilarityShim(object):`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`entailment_types = ["entailment", "contradiction", "neutral"]`

Rename entailment example 2016-11-01 03:51:54 +03:00			`@classmethod`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`def load(cls, path, nlp, max_length=100, get_features=None):`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00
Rename entailment example 2016-11-01 03:51:54 +03:00			`if get_features is None:`
Fix x keras deep learning example 2017-01-31 22:27:13 +03:00			`get_features = get_word_ids`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00
			`with (path / "config.json").open() as file_:`
Fix x keras deep learning example 2017-01-31 22:27:13 +03:00			`model = model_from_json(file_.read())`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00			`with (path / "model").open("rb") as file_:`
Rename entailment example 2016-11-01 03:51:54 +03:00			`weights = pickle.load(file_)`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00
Rename entailment example 2016-11-01 03:51:54 +03:00			`embeddings = get_embeddings(nlp.vocab)`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`weights.insert(1, embeddings)`
			`model.set_weights(weights)`

Fix x keras deep learning example 2017-01-31 22:27:13 +03:00			`return cls(model, get_features=get_features, max_length=max_length)`
Rename entailment example 2016-11-01 03:51:54 +03:00
Fix x keras deep learning example 2017-01-31 22:27:13 +03:00			`def __init__(self, model, get_features=None, max_length=100):`
Rename entailment example 2016-11-01 03:51:54 +03:00			`self.model = model`
			`self.get_features = get_features`
Fix x keras deep learning example 2017-01-31 22:27:13 +03:00			`self.max_length = max_length`
Rename entailment example 2016-11-01 03:51:54 +03:00
			`def __call__(self, doc):`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00			`doc.user_hooks["similarity"] = self.predict`
			`doc.user_span_hooks["similarity"] = self.predict`
Fix x keras deep learning example 2017-01-31 22:27:13 +03:00
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`return doc`

Rename entailment example 2016-11-01 03:51:54 +03:00			`def predict(self, doc1, doc2):`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`x1 = self.get_features([doc1], max_length=self.max_length)`
			`x2 = self.get_features([doc2], max_length=self.max_length)`
Rename entailment example 2016-11-01 03:51:54 +03:00			`scores = self.model.predict([x1, x2])`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00
			`return self.entailment_types[scores.argmax()], scores.max()`
Rename entailment example 2016-11-01 03:51:54 +03:00

Add partial embedding updates to Parikh model, fix dropout, other corrections. 2016-11-18 15:32:12 +03:00			`def get_embeddings(vocab, nr_unk=100):`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`# the extra +1 is for a zero vector representing sentence-final padding`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00			`num_vectors = max(lex.rank for lex in vocab) + 2`

Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`# create random vectors for OOV tokens`
			`oov = np.random.normal(size=(nr_unk, vocab.vectors_length))`
			`oov = oov / oov.sum(axis=1, keepdims=True)`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00
			`vectors = np.zeros((num_vectors + nr_unk, vocab.vectors_length), dtype="float32")`
			`vectors[1 : (nr_unk + 1),] = oov`
Rename entailment example 2016-11-01 03:51:54 +03:00			`for lex in vocab:`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`if lex.has_vector and lex.vector_norm > 0:`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00			`vectors[nr_unk + lex.rank + 1] = lex.vector / lex.vector_norm`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00
Rename entailment example 2016-11-01 03:51:54 +03:00			`return vectors`


Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`def get_word_ids(docs, max_length=100, nr_unk=100):`
Tidy up and auto-format [ci skip] 2019-08-31 14:39:31 +03:00			`Xs = np.zeros((len(docs), max_length), dtype="int32")`

Rename entailment example 2016-11-01 03:51:54 +03:00			`for i, doc in enumerate(docs):`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`for j, token in enumerate(doc):`
			`if j == max_length:`
			`break`
Add partial embedding updates to Parikh model, fix dropout, other corrections. 2016-11-18 15:32:12 +03:00			`if token.has_vector:`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`Xs[i, j] = token.rank + nr_unk + 1`
Add partial embedding updates to Parikh model, fix dropout, other corrections. 2016-11-18 15:32:12 +03:00			`else:`
Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround 2018-10-01 11:28:45 +03:00			`Xs[i, j] = token.rank % nr_unk + 1`
Rename entailment example 2016-11-01 03:51:54 +03:00			`return Xs`