spaCy/examples/notebooks/Decompositional Attention.ipynb
Ines Montani f37863093a 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003)
Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉

See here: https://github.com/explosion/srsly

    Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place.

    At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel.

    srsly currently includes forks of the following packages:

        ujson
        msgpack
        msgpack-numpy
        cloudpickle



* WIP: replace json/ujson with srsly

* Replace ujson in examples

Use regular json instead of srsly to make code easier to read and follow

* Update requirements

* Fix imports

* Fix typos

* Replace msgpack with srsly

* Fix warning
2018-12-03 01:28:22 +01:00

956 lines
49 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Natural language inference using spaCy and Keras"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook details an implementation of the natural language inference model presented in [(Parikh et al, 2016)](https://arxiv.org/abs/1606.01933). The model is notable for the small number of paramaters *and hyperparameters* it specifices, while still yielding good performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Constructing the dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import spacy\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We only need the GloVe vectors from spaCy, not a full NLP pipeline."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"nlp = spacy.load('en_vectors_web_lg')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Function to load the SNLI dataset. The categories are converted to one-shot representation. The function comes from an example in spaCy."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jds/tensorflow-gpu/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
" from ._conv import register_converters as _register_converters\n",
"Using TensorFlow backend.\n"
]
}
],
"source": [
"import json\n",
"from keras.utils import to_categorical\n",
"\n",
"LABELS = {'entailment': 0, 'contradiction': 1, 'neutral': 2}\n",
"def read_snli(path):\n",
" texts1 = []\n",
" texts2 = []\n",
" labels = []\n",
" with open(path, 'r') as file_:\n",
" for line in file_:\n",
" eg = json.loads(line)\n",
" label = eg['gold_label']\n",
" if label == '-': # per Parikh, ignore - SNLI entries\n",
" continue\n",
" texts1.append(eg['sentence1'])\n",
" texts2.append(eg['sentence2'])\n",
" labels.append(LABELS[label])\n",
" return texts1, texts2, to_categorical(np.asarray(labels, dtype='int32'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because Keras can do the train/test split for us, we'll load *all* SNLI triples from one file."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"texts,hypotheses,labels = read_snli('snli/snli_1.0_train.jsonl')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def create_dataset(nlp, texts, hypotheses, num_oov, max_length, norm_vectors = True):\n",
" sents = texts + hypotheses\n",
" \n",
" # the extra +1 is for a zero vector represting NULL for padding\n",
" num_vectors = max(lex.rank for lex in nlp.vocab) + 2 \n",
" \n",
" # create random vectors for OOV tokens\n",
" oov = np.random.normal(size=(num_oov, nlp.vocab.vectors_length))\n",
" oov = oov / oov.sum(axis=1, keepdims=True)\n",
" \n",
" vectors = np.zeros((num_vectors + num_oov, nlp.vocab.vectors_length), dtype='float32')\n",
" vectors[num_vectors:, ] = oov\n",
" for lex in nlp.vocab:\n",
" if lex.has_vector and lex.vector_norm > 0:\n",
" vectors[lex.rank + 1] = lex.vector / lex.vector_norm if norm_vectors == True else lex.vector\n",
" \n",
" sents_as_ids = []\n",
" for sent in sents:\n",
" doc = nlp(sent)\n",
" word_ids = []\n",
" \n",
" for i, token in enumerate(doc):\n",
" # skip odd spaces from tokenizer\n",
" if token.has_vector and token.vector_norm == 0:\n",
" continue\n",
" \n",
" if i > max_length:\n",
" break\n",
" \n",
" if token.has_vector:\n",
" word_ids.append(token.rank + 1)\n",
" else:\n",
" # if we don't have a vector, pick an OOV entry\n",
" word_ids.append(token.rank % num_oov + num_vectors) \n",
" \n",
" # there must be a simpler way of generating padded arrays from lists...\n",
" word_id_vec = np.zeros((max_length), dtype='int')\n",
" clipped_len = min(max_length, len(word_ids))\n",
" word_id_vec[:clipped_len] = word_ids[:clipped_len]\n",
" sents_as_ids.append(word_id_vec)\n",
" \n",
" \n",
" return vectors, np.array(sents_as_ids[:len(texts)]), np.array(sents_as_ids[len(texts):])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"sem_vectors, text_vectors, hypothesis_vectors = create_dataset(nlp, texts, hypotheses, 100, 50, True)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"texts_test,hypotheses_test,labels_test = read_snli('snli/snli_1.0_test.jsonl')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"_, text_vectors_test, hypothesis_vectors_test = create_dataset(nlp, texts_test, hypotheses_test, 100, 50, True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use spaCy to tokenize the sentences and return, when available, a semantic vector for each token. \n",
"\n",
"OOV terms (tokens for which no semantic vector is available) are assigned to one of a set of randomly-generated OOV vectors, per (Parikh et al, 2016).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we will clip sentences to 50 words maximum."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"from keras import layers, Model, models\n",
"from keras import backend as K"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The embedding layer copies the 300-dimensional GloVe vectors into GPU memory. Per (Parikh et al, 2016), the vectors, which are not adapted during training, are projected down to lower-dimensional vectors using a trained projection matrix."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def create_embedding(vectors, max_length, projected_dim):\n",
" return models.Sequential([\n",
" layers.Embedding(\n",
" vectors.shape[0],\n",
" vectors.shape[1],\n",
" input_length=max_length,\n",
" weights=[vectors],\n",
" trainable=False),\n",
" \n",
" layers.TimeDistributed(\n",
" layers.Dense(projected_dim,\n",
" activation=None,\n",
" use_bias=False))\n",
" ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Parikh model makes use of three feedforward blocks that construct nonlinear combinations of their input. Each block contains two ReLU layers and two dropout layers."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def create_feedforward(num_units=200, activation='relu', dropout_rate=0.2):\n",
" return models.Sequential([\n",
" layers.Dense(num_units, activation=activation),\n",
" layers.Dropout(dropout_rate),\n",
" layers.Dense(num_units, activation=activation),\n",
" layers.Dropout(dropout_rate)\n",
" ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The basic idea of the (Parikh et al, 2016) model is to:\n",
"\n",
"1. *Align*: Construct an alignment of subphrases in the text and hypothesis using an attention-like mechanism, called \"decompositional\" because the layer is applied to each of the two sentences individually rather than to their product. The dot product of the nonlinear transformations of the inputs is then normalized vertically and horizontally to yield a pair of \"soft\" alignment structures, from text->hypothesis and hypothesis->text. Concretely, for each word in one sentence, a multinomial distribution is computed over the words of the other sentence, by learning a multinomial logistic with softmax target.\n",
"2. *Compare*: Each word is now compared to its aligned phrase using a function modeled as a two-layer feedforward ReLU network. The output is a high-dimensional representation of the strength of association between word and aligned phrase.\n",
"3. *Aggregate*: The comparison vectors are summed, separately, for the text and the hypothesis. The result is two vectors: one that describes the degree of association of the text to the hypothesis, and the second, of the hypothesis to the text.\n",
"4. Finally, these two vectors are processed by a dense layer followed by a softmax classifier, as usual.\n",
"\n",
"Note that because in entailment the truth conditions of the consequent must be a subset of those of the antecedent, it is not obvious that we need both vectors in step (3). Entailment is not symmetric. It may be enough to just use the hypothesis->text vector. We will explore this possibility later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need a couple of little functions for Lambda layers to normalize and aggregate weights:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def normalizer(axis):\n",
" def _normalize(att_weights):\n",
" exp_weights = K.exp(att_weights)\n",
" sum_weights = K.sum(exp_weights, axis=axis, keepdims=True)\n",
" return exp_weights/sum_weights\n",
" return _normalize\n",
"\n",
"def sum_word(x):\n",
" return K.sum(x, axis=1)\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"def build_model(vectors, max_length, num_hidden, num_classes, projected_dim, entail_dir='both'):\n",
" input1 = layers.Input(shape=(max_length,), dtype='int32', name='words1')\n",
" input2 = layers.Input(shape=(max_length,), dtype='int32', name='words2')\n",
" \n",
" # embeddings (projected)\n",
" embed = create_embedding(vectors, max_length, projected_dim)\n",
" \n",
" a = embed(input1)\n",
" b = embed(input2)\n",
" \n",
" # step 1: attend\n",
" F = create_feedforward(num_hidden)\n",
" att_weights = layers.dot([F(a), F(b)], axes=-1)\n",
" \n",
" G = create_feedforward(num_hidden)\n",
" \n",
" if entail_dir == 'both':\n",
" norm_weights_a = layers.Lambda(normalizer(1))(att_weights)\n",
" norm_weights_b = layers.Lambda(normalizer(2))(att_weights)\n",
" alpha = layers.dot([norm_weights_a, a], axes=1)\n",
" beta = layers.dot([norm_weights_b, b], axes=1)\n",
"\n",
" # step 2: compare\n",
" comp1 = layers.concatenate([a, beta])\n",
" comp2 = layers.concatenate([b, alpha])\n",
" v1 = layers.TimeDistributed(G)(comp1)\n",
" v2 = layers.TimeDistributed(G)(comp2)\n",
"\n",
" # step 3: aggregate\n",
" v1_sum = layers.Lambda(sum_word)(v1)\n",
" v2_sum = layers.Lambda(sum_word)(v2)\n",
" concat = layers.concatenate([v1_sum, v2_sum])\n",
" elif entail_dir == 'left':\n",
" norm_weights_a = layers.Lambda(normalizer(1))(att_weights)\n",
" alpha = layers.dot([norm_weights_a, a], axes=1)\n",
" comp2 = layers.concatenate([b, alpha])\n",
" v2 = layers.TimeDistributed(G)(comp2)\n",
" v2_sum = layers.Lambda(sum_word)(v2)\n",
" concat = v2_sum\n",
" else:\n",
" norm_weights_b = layers.Lambda(normalizer(2))(att_weights)\n",
" beta = layers.dot([norm_weights_b, b], axes=1)\n",
" comp1 = layers.concatenate([a, beta])\n",
" v1 = layers.TimeDistributed(G)(comp1)\n",
" v1_sum = layers.Lambda(sum_word)(v1)\n",
" concat = v1_sum\n",
" \n",
" H = create_feedforward(num_hidden)\n",
" out = H(concat)\n",
" out = layers.Dense(num_classes, activation='softmax')(out)\n",
" \n",
" model = Model([input1, input2], out)\n",
" \n",
" model.compile(optimizer='adam',\n",
" loss='categorical_crossentropy',\n",
" metrics=['accuracy'])\n",
" return model\n",
" \n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"words1 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"words2 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"sequential_1 (Sequential) (None, 50, 200) 321381600 words1[0][0] \n",
" words2[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_2 (Sequential) (None, 50, 200) 80400 sequential_1[1][0] \n",
" sequential_1[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_1 (Dot) (None, 50, 50) 0 sequential_2[1][0] \n",
" sequential_2[2][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_2 (Lambda) (None, 50, 50) 0 dot_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_1 (Lambda) (None, 50, 50) 0 dot_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"dot_3 (Dot) (None, 50, 200) 0 lambda_2[0][0] \n",
" sequential_1[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_2 (Dot) (None, 50, 200) 0 lambda_1[0][0] \n",
" sequential_1[1][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_1 (Concatenate) (None, 50, 400) 0 sequential_1[1][0] \n",
" dot_3[0][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_2 (Concatenate) (None, 50, 400) 0 sequential_1[2][0] \n",
" dot_2[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_2 (TimeDistrib (None, 50, 200) 120400 concatenate_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_3 (TimeDistrib (None, 50, 200) 120400 concatenate_2[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_3 (Lambda) (None, 200) 0 time_distributed_2[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_4 (Lambda) (None, 200) 0 time_distributed_3[0][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_3 (Concatenate) (None, 400) 0 lambda_3[0][0] \n",
" lambda_4[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_4 (Sequential) (None, 200) 120400 concatenate_3[0][0] \n",
"__________________________________________________________________________________________________\n",
"dense_8 (Dense) (None, 3) 603 sequential_4[1][0] \n",
"==================================================================================================\n",
"Total params: 321,703,403\n",
"Trainable params: 381,803\n",
"Non-trainable params: 321,321,600\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"K.clear_session()\n",
"m = build_model(sem_vectors, 50, 200, 3, 200)\n",
"m.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The number of trainable parameters, ~381k, is the number given by Parikh et al, so we're on the right track."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parikh et al use tiny batches of 4, training for 50MM batches, which amounts to around 500 epochs. Here we'll use large batches to better use the GPU, and train for fewer epochs -- for purposes of this experiment."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 549367 samples, validate on 9824 samples\n",
"Epoch 1/50\n",
"549367/549367 [==============================] - 34s 62us/step - loss: 0.7599 - acc: 0.6617 - val_loss: 0.5396 - val_acc: 0.7861\n",
"Epoch 2/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.5611 - acc: 0.7763 - val_loss: 0.4892 - val_acc: 0.8085\n",
"Epoch 3/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.5212 - acc: 0.7948 - val_loss: 0.4574 - val_acc: 0.8261\n",
"Epoch 4/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4986 - acc: 0.8045 - val_loss: 0.4410 - val_acc: 0.8274\n",
"Epoch 5/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4819 - acc: 0.8114 - val_loss: 0.4224 - val_acc: 0.8383\n",
"Epoch 6/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4714 - acc: 0.8166 - val_loss: 0.4200 - val_acc: 0.8379\n",
"Epoch 7/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4633 - acc: 0.8203 - val_loss: 0.4098 - val_acc: 0.8457\n",
"Epoch 8/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4558 - acc: 0.8232 - val_loss: 0.4114 - val_acc: 0.8415\n",
"Epoch 9/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4508 - acc: 0.8250 - val_loss: 0.4062 - val_acc: 0.8477\n",
"Epoch 10/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4433 - acc: 0.8286 - val_loss: 0.3982 - val_acc: 0.8486\n",
"Epoch 11/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4388 - acc: 0.8307 - val_loss: 0.3953 - val_acc: 0.8497\n",
"Epoch 12/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4351 - acc: 0.8321 - val_loss: 0.3973 - val_acc: 0.8522\n",
"Epoch 13/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4309 - acc: 0.8342 - val_loss: 0.3939 - val_acc: 0.8539\n",
"Epoch 14/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4269 - acc: 0.8355 - val_loss: 0.3932 - val_acc: 0.8517\n",
"Epoch 15/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4247 - acc: 0.8369 - val_loss: 0.3938 - val_acc: 0.8515\n",
"Epoch 16/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4208 - acc: 0.8379 - val_loss: 0.3936 - val_acc: 0.8504\n",
"Epoch 17/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4194 - acc: 0.8390 - val_loss: 0.3885 - val_acc: 0.8560\n",
"Epoch 18/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4162 - acc: 0.8402 - val_loss: 0.3874 - val_acc: 0.8561\n",
"Epoch 19/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4140 - acc: 0.8409 - val_loss: 0.3889 - val_acc: 0.8545\n",
"Epoch 20/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4114 - acc: 0.8426 - val_loss: 0.3864 - val_acc: 0.8583\n",
"Epoch 21/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4092 - acc: 0.8430 - val_loss: 0.3870 - val_acc: 0.8561\n",
"Epoch 22/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4062 - acc: 0.8442 - val_loss: 0.3852 - val_acc: 0.8577\n",
"Epoch 23/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4050 - acc: 0.8450 - val_loss: 0.3850 - val_acc: 0.8578\n",
"Epoch 24/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4035 - acc: 0.8455 - val_loss: 0.3825 - val_acc: 0.8555\n",
"Epoch 25/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4018 - acc: 0.8460 - val_loss: 0.3837 - val_acc: 0.8573\n",
"Epoch 26/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3989 - acc: 0.8476 - val_loss: 0.3843 - val_acc: 0.8599\n",
"Epoch 27/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3979 - acc: 0.8481 - val_loss: 0.3841 - val_acc: 0.8589\n",
"Epoch 28/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3967 - acc: 0.8484 - val_loss: 0.3811 - val_acc: 0.8575\n",
"Epoch 29/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3956 - acc: 0.8492 - val_loss: 0.3829 - val_acc: 0.8589\n",
"Epoch 30/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3938 - acc: 0.8499 - val_loss: 0.3859 - val_acc: 0.8562\n",
"Epoch 31/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3925 - acc: 0.8500 - val_loss: 0.3798 - val_acc: 0.8587\n",
"Epoch 32/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3906 - acc: 0.8509 - val_loss: 0.3834 - val_acc: 0.8569\n",
"Epoch 33/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3893 - acc: 0.8511 - val_loss: 0.3806 - val_acc: 0.8588\n",
"Epoch 34/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3885 - acc: 0.8515 - val_loss: 0.3828 - val_acc: 0.8603\n",
"Epoch 35/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3879 - acc: 0.8520 - val_loss: 0.3800 - val_acc: 0.8594\n",
"Epoch 36/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3860 - acc: 0.8530 - val_loss: 0.3796 - val_acc: 0.8577\n",
"Epoch 37/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3856 - acc: 0.8532 - val_loss: 0.3857 - val_acc: 0.8591\n",
"Epoch 38/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3838 - acc: 0.8535 - val_loss: 0.3835 - val_acc: 0.8603\n",
"Epoch 39/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3830 - acc: 0.8543 - val_loss: 0.3830 - val_acc: 0.8599\n",
"Epoch 40/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3818 - acc: 0.8548 - val_loss: 0.3832 - val_acc: 0.8559\n",
"Epoch 41/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3806 - acc: 0.8551 - val_loss: 0.3845 - val_acc: 0.8553\n",
"Epoch 42/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3803 - acc: 0.8550 - val_loss: 0.3789 - val_acc: 0.8617\n",
"Epoch 43/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3791 - acc: 0.8556 - val_loss: 0.3835 - val_acc: 0.8580\n",
"Epoch 44/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3778 - acc: 0.8565 - val_loss: 0.3799 - val_acc: 0.8580\n",
"Epoch 45/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3766 - acc: 0.8571 - val_loss: 0.3790 - val_acc: 0.8625\n",
"Epoch 46/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3770 - acc: 0.8569 - val_loss: 0.3820 - val_acc: 0.8590\n",
"Epoch 47/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3761 - acc: 0.8573 - val_loss: 0.3831 - val_acc: 0.8581\n",
"Epoch 48/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3739 - acc: 0.8579 - val_loss: 0.3828 - val_acc: 0.8599\n",
"Epoch 49/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3738 - acc: 0.8577 - val_loss: 0.3785 - val_acc: 0.8590\n",
"Epoch 50/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3726 - acc: 0.8580 - val_loss: 0.3820 - val_acc: 0.8585\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7f5c9f49c438>"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=50,validation_data=([text_vectors_test, hypothesis_vectors_test], labels_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is broadly in the region reported by Parikh et al: ~86 vs 86.3%. The small difference might be accounted by differences in `max_length` (here set at 50), in the training regime, and that here we use Keras' built-in validation splitting rather than the SNLI test set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Experiment: the asymmetric model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It was suggested earlier that, based on the semantics of entailment, the vector representing the strength of association between the hypothesis to the text is all that is needed for classifying the entailment.\n",
"\n",
"The following model removes consideration of the complementary vector (text to hypothesis) from the computation. This will decrease the paramater count slightly, because the final dense layers will be smaller, and speed up the forward pass when predicting, because fewer calculations will be needed."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"words2 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"words1 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"sequential_5 (Sequential) (None, 50, 200) 321381600 words1[0][0] \n",
" words2[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_6 (Sequential) (None, 50, 200) 80400 sequential_5[1][0] \n",
" sequential_5[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_4 (Dot) (None, 50, 50) 0 sequential_6[1][0] \n",
" sequential_6[2][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_5 (Lambda) (None, 50, 50) 0 dot_4[0][0] \n",
"__________________________________________________________________________________________________\n",
"dot_5 (Dot) (None, 50, 200) 0 lambda_5[0][0] \n",
" sequential_5[1][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_4 (Concatenate) (None, 50, 400) 0 sequential_5[2][0] \n",
" dot_5[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_5 (TimeDistrib (None, 50, 200) 120400 concatenate_4[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_6 (Lambda) (None, 200) 0 time_distributed_5[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_8 (Sequential) (None, 200) 80400 lambda_6[0][0] \n",
"__________________________________________________________________________________________________\n",
"dense_16 (Dense) (None, 3) 603 sequential_8[1][0] \n",
"==================================================================================================\n",
"Total params: 321,663,403\n",
"Trainable params: 341,803\n",
"Non-trainable params: 321,321,600\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"m1 = build_model(sem_vectors, 50, 200, 3, 200, 'left')\n",
"m1.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The parameter count has indeed decreased by 40,000, corresponding to the 200x200 smaller H function."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 549367 samples, validate on 9824 samples\n",
"Epoch 1/50\n",
"549367/549367 [==============================] - 25s 46us/step - loss: 0.7331 - acc: 0.6770 - val_loss: 0.5257 - val_acc: 0.7936\n",
"Epoch 2/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.5518 - acc: 0.7799 - val_loss: 0.4717 - val_acc: 0.8159\n",
"Epoch 3/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.5147 - acc: 0.7967 - val_loss: 0.4449 - val_acc: 0.8278\n",
"Epoch 4/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4948 - acc: 0.8060 - val_loss: 0.4326 - val_acc: 0.8344\n",
"Epoch 5/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4814 - acc: 0.8122 - val_loss: 0.4247 - val_acc: 0.8359\n",
"Epoch 6/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4712 - acc: 0.8162 - val_loss: 0.4143 - val_acc: 0.8430\n",
"Epoch 7/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4635 - acc: 0.8205 - val_loss: 0.4172 - val_acc: 0.8401\n",
"Epoch 8/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4570 - acc: 0.8223 - val_loss: 0.4106 - val_acc: 0.8422\n",
"Epoch 9/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4505 - acc: 0.8259 - val_loss: 0.4043 - val_acc: 0.8451\n",
"Epoch 10/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4459 - acc: 0.8280 - val_loss: 0.4050 - val_acc: 0.8467\n",
"Epoch 11/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4405 - acc: 0.8300 - val_loss: 0.3975 - val_acc: 0.8481\n",
"Epoch 12/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4360 - acc: 0.8324 - val_loss: 0.4026 - val_acc: 0.8496\n",
"Epoch 13/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4327 - acc: 0.8334 - val_loss: 0.4024 - val_acc: 0.8471\n",
"Epoch 14/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4293 - acc: 0.8350 - val_loss: 0.3955 - val_acc: 0.8496\n",
"Epoch 15/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4263 - acc: 0.8369 - val_loss: 0.3980 - val_acc: 0.8490\n",
"Epoch 16/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4236 - acc: 0.8377 - val_loss: 0.3958 - val_acc: 0.8496\n",
"Epoch 17/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4213 - acc: 0.8384 - val_loss: 0.3954 - val_acc: 0.8496\n",
"Epoch 18/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4187 - acc: 0.8394 - val_loss: 0.3929 - val_acc: 0.8514\n",
"Epoch 19/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4157 - acc: 0.8409 - val_loss: 0.3939 - val_acc: 0.8507\n",
"Epoch 20/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4135 - acc: 0.8417 - val_loss: 0.3953 - val_acc: 0.8522\n",
"Epoch 21/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4122 - acc: 0.8424 - val_loss: 0.3974 - val_acc: 0.8506\n",
"Epoch 22/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4099 - acc: 0.8435 - val_loss: 0.3918 - val_acc: 0.8522\n",
"Epoch 23/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4075 - acc: 0.8443 - val_loss: 0.3901 - val_acc: 0.8513\n",
"Epoch 24/50\n",
"549367/549367 [==============================] - 24s 44us/step - loss: 0.4067 - acc: 0.8447 - val_loss: 0.3885 - val_acc: 0.8543\n",
"Epoch 25/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4047 - acc: 0.8454 - val_loss: 0.3846 - val_acc: 0.8531\n",
"Epoch 26/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4031 - acc: 0.8461 - val_loss: 0.3864 - val_acc: 0.8562\n",
"Epoch 27/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4020 - acc: 0.8467 - val_loss: 0.3874 - val_acc: 0.8546\n",
"Epoch 28/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4001 - acc: 0.8473 - val_loss: 0.3848 - val_acc: 0.8534\n",
"Epoch 29/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3991 - acc: 0.8479 - val_loss: 0.3865 - val_acc: 0.8562\n",
"Epoch 30/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3976 - acc: 0.8484 - val_loss: 0.3833 - val_acc: 0.8574\n",
"Epoch 31/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3961 - acc: 0.8487 - val_loss: 0.3846 - val_acc: 0.8585\n",
"Epoch 32/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3942 - acc: 0.8498 - val_loss: 0.3805 - val_acc: 0.8573\n",
"Epoch 33/50\n",
"549367/549367 [==============================] - 24s 44us/step - loss: 0.3935 - acc: 0.8503 - val_loss: 0.3856 - val_acc: 0.8579\n",
"Epoch 34/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3923 - acc: 0.8507 - val_loss: 0.3829 - val_acc: 0.8560\n",
"Epoch 35/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3920 - acc: 0.8508 - val_loss: 0.3864 - val_acc: 0.8575\n",
"Epoch 36/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3907 - acc: 0.8516 - val_loss: 0.3873 - val_acc: 0.8563\n",
"Epoch 37/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3891 - acc: 0.8519 - val_loss: 0.3850 - val_acc: 0.8570\n",
"Epoch 38/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3872 - acc: 0.8522 - val_loss: 0.3815 - val_acc: 0.8591\n",
"Epoch 39/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3887 - acc: 0.8520 - val_loss: 0.3829 - val_acc: 0.8590\n",
"Epoch 40/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3868 - acc: 0.8531 - val_loss: 0.3807 - val_acc: 0.8600\n",
"Epoch 41/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3859 - acc: 0.8537 - val_loss: 0.3832 - val_acc: 0.8574\n",
"Epoch 42/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3849 - acc: 0.8537 - val_loss: 0.3850 - val_acc: 0.8576\n",
"Epoch 43/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3834 - acc: 0.8541 - val_loss: 0.3825 - val_acc: 0.8563\n",
"Epoch 44/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3829 - acc: 0.8548 - val_loss: 0.3844 - val_acc: 0.8540\n",
"Epoch 45/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3816 - acc: 0.8552 - val_loss: 0.3841 - val_acc: 0.8559\n",
"Epoch 46/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3816 - acc: 0.8549 - val_loss: 0.3880 - val_acc: 0.8567\n",
"Epoch 47/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.3799 - acc: 0.8559 - val_loss: 0.3767 - val_acc: 0.8635\n",
"Epoch 48/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3800 - acc: 0.8560 - val_loss: 0.3786 - val_acc: 0.8563\n",
"Epoch 49/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3781 - acc: 0.8563 - val_loss: 0.3812 - val_acc: 0.8596\n",
"Epoch 50/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3788 - acc: 0.8560 - val_loss: 0.3782 - val_acc: 0.8601\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7f5ca1bf3e48>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=50,validation_data=([text_vectors_test, hypothesis_vectors_test], labels_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This model performs the same as the slightly more complex model that evaluates alignments in both directions. Note also that processing time is improved, from 64 down to 48 microseconds per step. \n",
"\n",
"Let's now look at an asymmetric model that evaluates text to hypothesis comparisons. The prediction is that such a model will correctly classify a decent proportion of the exemplars, but not as accurately as the previous two.\n",
"\n",
"We'll just use 10 epochs for expediency."
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"words1 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"words2 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"sequential_13 (Sequential) (None, 50, 200) 321381600 words1[0][0] \n",
" words2[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_14 (Sequential) (None, 50, 200) 80400 sequential_13[1][0] \n",
" sequential_13[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_8 (Dot) (None, 50, 50) 0 sequential_14[1][0] \n",
" sequential_14[2][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_9 (Lambda) (None, 50, 50) 0 dot_8[0][0] \n",
"__________________________________________________________________________________________________\n",
"dot_9 (Dot) (None, 50, 200) 0 lambda_9[0][0] \n",
" sequential_13[2][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_6 (Concatenate) (None, 50, 400) 0 sequential_13[1][0] \n",
" dot_9[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_9 (TimeDistrib (None, 50, 200) 120400 concatenate_6[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_10 (Lambda) (None, 200) 0 time_distributed_9[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_16 (Sequential) (None, 200) 80400 lambda_10[0][0] \n",
"__________________________________________________________________________________________________\n",
"dense_32 (Dense) (None, 3) 603 sequential_16[1][0] \n",
"==================================================================================================\n",
"Total params: 321,663,403\n",
"Trainable params: 341,803\n",
"Non-trainable params: 321,321,600\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"m2 = build_model(sem_vectors, 50, 200, 3, 200, 'right')\n",
"m2.summary()"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 455226 samples, validate on 113807 samples\n",
"Epoch 1/10\n",
"455226/455226 [==============================] - 22s 49us/step - loss: 0.8920 - acc: 0.5771 - val_loss: 0.8001 - val_acc: 0.6435\n",
"Epoch 2/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.7808 - acc: 0.6553 - val_loss: 0.7267 - val_acc: 0.6855\n",
"Epoch 3/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.7329 - acc: 0.6825 - val_loss: 0.6966 - val_acc: 0.7006\n",
"Epoch 4/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.7055 - acc: 0.6978 - val_loss: 0.6713 - val_acc: 0.7150\n",
"Epoch 5/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6862 - acc: 0.7081 - val_loss: 0.6533 - val_acc: 0.7253\n",
"Epoch 6/10\n",
"455226/455226 [==============================] - 21s 47us/step - loss: 0.6694 - acc: 0.7179 - val_loss: 0.6472 - val_acc: 0.7277\n",
"Epoch 7/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6555 - acc: 0.7252 - val_loss: 0.6338 - val_acc: 0.7347\n",
"Epoch 8/10\n",
"455226/455226 [==============================] - 22s 48us/step - loss: 0.6434 - acc: 0.7310 - val_loss: 0.6246 - val_acc: 0.7385\n",
"Epoch 9/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6325 - acc: 0.7367 - val_loss: 0.6164 - val_acc: 0.7424\n",
"Epoch 10/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6216 - acc: 0.7426 - val_loss: 0.6082 - val_acc: 0.7478\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7fa6850cf080>"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m2.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=10,validation_split=.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparing this fit to the validation accuracy of the previous two models after 10 epochs, we observe that its accuracy is roughly 10% lower.\n",
"\n",
"It is reassuring that the neural modeling here reproduces what we know from the semantics of natural language!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}