This commit is contained in:
Matthew Honnibal 2018-10-02 19:44:43 +02:00
commit 9e4079ddb2
13 changed files with 1326 additions and 394 deletions

View File

@ -2,11 +2,7 @@
# A decomposable attention model for Natural Language Inference
**by Matthew Honnibal, [@honnibal](https://github.com/honnibal)**
> ⚠️ **IMPORTANT NOTE:** This example is currently only compatible with spaCy
> v1.x. We're working on porting the example over to Keras v2.x and spaCy v2.x.
> See [#1445](https://github.com/explosion/spaCy/issues/1445) for details
> contributions welcome!
**Updated for spaCy 2.0+ and Keras 2.2.2+ by John Stewart, [@free-variation](https://github.com/free-variation)**
This directory contains an implementation of the entailment prediction model described
by [Parikh et al. (2016)](https://arxiv.org/pdf/1606.01933.pdf). The model is notable
@ -21,19 +17,25 @@ hook is installed to customise the `.similarity()` method of spaCy's `Doc`
and `Span` objects:
```python
def demo(model_dir):
nlp = spacy.load('en', path=model_dir,
create_pipeline=create_similarity_pipeline)
doc1 = nlp(u'Worst fries ever! Greasy and horrible...')
doc2 = nlp(u'The milkshakes are good. The fries are bad.')
print(doc1.similarity(doc2))
sent1a, sent1b = doc1.sents
print(sent1a.similarity(sent1b))
print(sent1a.similarity(doc2))
print(sent1b.similarity(doc2))
def demo(shape):
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
doc1 = nlp(u'The king of France is bald.')
doc2 = nlp(u'France has no king.')
print("Sentence 1:", doc1)
print("Sentence 2:", doc2)
entailment_type, confidence = doc1.similarity(doc2)
print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")
```
Which gives the output `Entailment type: contradiction (Confidence: 0.60604566)`, showing that
the system has definite opinions about Betrand Russell's [famous conundrum](https://users.drew.edu/jlenz/br-on-denoting.html)!
I'm working on a blog post to explain Parikh et al.'s model in more detail.
A [notebook](https://github.com/free-variation/spaCy/blob/master/examples/notebooks/Decompositional%20Attention.ipynb) is available that briefly explains this implementation.
I think it is a very interesting example of the attention mechanism, which
I didn't understand very well before working through this paper. There are
lots of ways to extend the model.
@ -43,7 +45,7 @@ lots of ways to extend the model.
| File | Description |
| --- | --- |
| `__main__.py` | The script that will be executed. Defines the CLI, the data reading, etc — all the boring stuff. |
| `spacy_hook.py` | Provides a class `SimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)`, you'll get the result of `your_model(doc1, doc2)`. |
| `spacy_hook.py` | Provides a class `KerasSimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)`, you'll get the result of `your_model(doc1, doc2)`. |
| `keras_decomposable_attention.py` | Defines the neural network model. |
## Setting up
@ -52,17 +54,13 @@ First, install [Keras](https://keras.io/), [spaCy](https://spacy.io) and the spa
English models (about 1GB of data):
```bash
pip install https://github.com/fchollet/keras/archive/1.2.2.zip
pip install keras
pip install spacy
python -m spacy.en.download
python -m spacy download en_vectors_web_lg
```
⚠️ **Important:** In order for the example to run, you'll need to install Keras from
the 1.2.2 release (and not via `pip install keras`). For more info on this, see
[#727](https://github.com/explosion/spaCy/issues/727).
You'll also want to get Keras working on your GPU. This will depend on your
set up, so you're mostly on your own for this step. If you're using AWS, try the
You'll also want to get Keras working on your GPU, and you will need a backend, such as TensorFlow or Theano.
This will depend on your set up, so you're mostly on your own for this step. If you're using AWS, try the
[NVidia AMI](https://aws.amazon.com/marketplace/pp/B00FYCDDTE). It made things pretty easy.
Once you've installed the dependencies, you can run a small preliminary test of
@ -80,22 +78,35 @@ Finally, download the [Stanford Natural Language Inference corpus](http://nlp.st
## Running the example
You can run the `keras_parikh_entailment/` directory as a script, which executes the file
[`keras_parikh_entailment/__main__.py`](__main__.py). The first thing you'll want to do is train the model:
[`keras_parikh_entailment/__main__.py`](__main__.py). If you run the script without arguments
the usage is shown. Running it with `-h` explains the command line arguments.
The first thing you'll want to do is train the model:
```bash
python keras_parikh_entailment/ train <train_directory> <dev_directory>
python keras_parikh_entailment/ train -t <path to SNLI train JSON> -s <path to SNLI dev JSON>
```
Training takes about 300 epochs for full accuracy, and I haven't rerun the full
experiment since refactoring things to publish this example — please let me
know if I've broken something. You should get to at least 85% on the development data.
know if I've broken something. You should get to at least 85% on the development data even after 10-15 epochs.
The other two modes demonstrate run-time usage. I never like relying on the accuracy printed
by `.fit()` methods. I never really feel confident until I've run a new process that loads
the model and starts making predictions, without access to the gold labels. I've therefore
included an `evaluate` mode. Finally, there's also a little demo, which mostly exists to show
included an `evaluate` mode.
```bash
python keras_parikh_entailment/ evaluate -s <path to SNLI train JSON>
```
Finally, there's also a little demo, which mostly exists to show
you how run-time usage will eventually look.
```bash
python keras_parikh_entailment/ demo
```
## Getting updates
We should have the blog post explaining the model ready before the end of the week. To get

View File

@ -1,82 +1,104 @@
from __future__ import division, unicode_literals, print_function
import spacy
import plac
from pathlib import Path
import numpy as np
import ujson as json
import numpy
from keras.utils.np_utils import to_categorical
from spacy_hook import get_embeddings, get_word_ids
from spacy_hook import create_similarity_pipeline
from keras.utils import to_categorical
import plac
import sys
from keras_decomposable_attention import build_model
from spacy_hook import get_embeddings, KerasSimilarityShim
try:
import cPickle as pickle
except ImportError:
import pickle
import spacy
# workaround for keras/tensorflow bug
# see https://github.com/tensorflow/tensorflow/issues/3388
import os
import importlib
from keras import backend as K
def set_keras_backend(backend):
if K.backend() != backend:
os.environ['KERAS_BACKEND'] = backend
importlib.reload(K)
assert K.backend() == backend
if backend == "tensorflow":
K.get_session().close()
cfg = K.tf.ConfigProto()
cfg.gpu_options.allow_growth = True
K.set_session(K.tf.Session(config=cfg))
K.clear_session()
set_keras_backend("tensorflow")
def train(train_loc, dev_loc, shape, settings):
train_texts1, train_texts2, train_labels = read_snli(train_loc)
dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)
print("Loading spaCy")
nlp = spacy.load('en')
nlp = spacy.load('en_vectors_web_lg')
assert nlp.path is not None
print("Processing texts...")
train_X = create_dataset(nlp, train_texts1, train_texts2, 100, shape[0])
dev_X = create_dataset(nlp, dev_texts1, dev_texts2, 100, shape[0])
print("Compiling network")
model = build_model(get_embeddings(nlp.vocab), shape, settings)
print("Processing texts...")
Xs = []
for texts in (train_texts1, train_texts2, dev_texts1, dev_texts2):
Xs.append(get_word_ids(list(nlp.pipe(texts, n_threads=20, batch_size=20000)),
max_length=shape[0],
rnn_encode=settings['gru_encode'],
tree_truncate=settings['tree_truncate']))
train_X1, train_X2, dev_X1, dev_X2 = Xs
print(settings)
model.fit(
[train_X1, train_X2],
train_X,
train_labels,
validation_data=([dev_X1, dev_X2], dev_labels),
nb_epoch=settings['nr_epoch'],
batch_size=settings['batch_size'])
validation_data = (dev_X, dev_labels),
epochs = settings['nr_epoch'],
batch_size = settings['batch_size'])
if not (nlp.path / 'similarity').exists():
(nlp.path / 'similarity').mkdir()
print("Saving to", nlp.path / 'similarity')
weights = model.get_weights()
# remove the embedding matrix. We can reconstruct it.
del weights[1]
with (nlp.path / 'similarity' / 'model').open('wb') as file_:
pickle.dump(weights[1:], file_)
with (nlp.path / 'similarity' / 'config.json').open('wb') as file_:
pickle.dump(weights, file_)
with (nlp.path / 'similarity' / 'config.json').open('w') as file_:
file_.write(model.to_json())
def evaluate(dev_loc):
def evaluate(dev_loc, shape):
dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)
nlp = spacy.load('en',
create_pipeline=create_similarity_pipeline)
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
total = 0.
correct = 0.
for text1, text2, label in zip(dev_texts1, dev_texts2, dev_labels):
doc1 = nlp(text1)
doc2 = nlp(text2)
sim = doc1.similarity(doc2)
if sim.argmax() == label.argmax():
sim, _ = doc1.similarity(doc2)
if sim == KerasSimilarityShim.entailment_types[label.argmax()]:
correct += 1
total += 1
return correct, total
def demo():
nlp = spacy.load('en',
create_pipeline=create_similarity_pipeline)
doc1 = nlp(u'What were the best crime fiction books in 2016?')
doc2 = nlp(
u'What should I read that was published last year? I like crime stories.')
print(doc1)
print(doc2)
print("Similarity", doc1.similarity(doc2))
def demo(shape):
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
doc1 = nlp(u'The king of France is bald.')
doc2 = nlp(u'France has no king.')
print("Sentence 1:", doc1)
print("Sentence 2:", doc2)
entailment_type, confidence = doc1.similarity(doc2)
print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")
LABELS = {'entailment': 0, 'contradiction': 1, 'neutral': 2}
@ -84,56 +106,92 @@ def read_snli(path):
texts1 = []
texts2 = []
labels = []
with path.open() as file_:
with open(path, 'r') as file_:
for line in file_:
eg = json.loads(line)
label = eg['gold_label']
if label == '-':
if label == '-': # per Parikh, ignore - SNLI entries
continue
texts1.append(eg['sentence1'])
texts2.append(eg['sentence2'])
labels.append(LABELS[label])
return texts1, texts2, to_categorical(numpy.asarray(labels, dtype='int32'))
return texts1, texts2, to_categorical(np.asarray(labels, dtype='int32'))
def create_dataset(nlp, texts, hypotheses, num_unk, max_length):
sents = texts + hypotheses
sents_as_ids = []
for sent in sents:
doc = nlp(sent)
word_ids = []
for i, token in enumerate(doc):
# skip odd spaces from tokenizer
if token.has_vector and token.vector_norm == 0:
continue
if i > max_length:
break
if token.has_vector:
word_ids.append(token.rank + num_unk + 1)
else:
# if we don't have a vector, pick an OOV entry
word_ids.append(token.rank % num_unk + 1)
# there must be a simpler way of generating padded arrays from lists...
word_id_vec = np.zeros((max_length), dtype='int')
clipped_len = min(max_length, len(word_ids))
word_id_vec[:clipped_len] = word_ids[:clipped_len]
sents_as_ids.append(word_id_vec)
return [np.array(sents_as_ids[:len(texts)]), np.array(sents_as_ids[len(texts):])]
@plac.annotations(
mode=("Mode to execute", "positional", None, str, ["train", "evaluate", "demo"]),
train_loc=("Path to training data", "positional", None, Path),
dev_loc=("Path to development data", "positional", None, Path),
train_loc=("Path to training data", "option", "t", str),
dev_loc=("Path to development or test data", "option", "s", str),
max_length=("Length to truncate sentences", "option", "L", int),
nr_hidden=("Number of hidden units", "option", "H", int),
dropout=("Dropout level", "option", "d", float),
learn_rate=("Learning rate", "option", "e", float),
learn_rate=("Learning rate", "option", "r", float),
batch_size=("Batch size for neural network training", "option", "b", int),
nr_epoch=("Number of training epochs", "option", "i", int),
tree_truncate=("Truncate sentences by tree distance", "flag", "T", bool),
gru_encode=("Encode sentences with bidirectional GRU", "flag", "E", bool),
nr_epoch=("Number of training epochs", "option", "e", int),
entail_dir=("Direction of entailment", "option", "D", str, ["both", "left", "right"])
)
def main(mode, train_loc, dev_loc,
tree_truncate=False,
gru_encode=False,
max_length=100,
nr_hidden=100,
dropout=0.2,
learn_rate=0.001,
batch_size=100,
nr_epoch=5):
max_length = 50,
nr_hidden = 200,
dropout = 0.2,
learn_rate = 0.001,
batch_size = 1024,
nr_epoch = 10,
entail_dir="both"):
shape = (max_length, nr_hidden, 3)
settings = {
'lr': learn_rate,
'dropout': dropout,
'batch_size': batch_size,
'nr_epoch': nr_epoch,
'tree_truncate': tree_truncate,
'gru_encode': gru_encode
'entail_dir': entail_dir
}
if mode == 'train':
if train_loc == None or dev_loc == None:
print("Train mode requires paths to training and development data sets.")
sys.exit(1)
train(train_loc, dev_loc, shape, settings)
elif mode == 'evaluate':
correct, total = evaluate(dev_loc)
if dev_loc == None:
print("Evaluate mode requires paths to test data set.")
sys.exit(1)
correct, total = evaluate(dev_loc, shape)
print(correct, '/', total, correct / total)
else:
demo()
demo(shape)
if __name__ == '__main__':
plac.call(main)

View File

@ -1,259 +1,137 @@
# Semantic similarity with decomposable attention (using spaCy and Keras)
# Practical state-of-the-art text similarity with spaCy and Keras
import numpy
from keras.layers import InputSpec, Layer, Input, Dense, merge
from keras.layers import Lambda, Activation, Dropout, Embedding, TimeDistributed
from keras.layers import Bidirectional, GRU, LSTM
from keras.layers.noise import GaussianNoise
from keras.layers.advanced_activations import ELU
import keras.backend as K
from keras.models import Sequential, Model, model_from_json
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import GlobalAveragePooling1D, GlobalMaxPooling1D
from keras.layers import Merge
# Semantic entailment/similarity with decomposable attention (using spaCy and Keras)
# Practical state-of-the-art textual entailment with spaCy and Keras
import numpy as np
from keras import layers, Model, models, optimizers
from keras import backend as K
def build_model(vectors, shape, settings):
'''Compile the model.'''
max_length, nr_hidden, nr_class = shape
# Declare inputs.
ids1 = Input(shape=(max_length,), dtype='int32', name='words1')
ids2 = Input(shape=(max_length,), dtype='int32', name='words2')
# Construct operations, which we'll chain together.
embed = _StaticEmbedding(vectors, max_length, nr_hidden, dropout=0.2, nr_tune=5000)
if settings['gru_encode']:
encode = _BiRNNEncoding(max_length, nr_hidden, dropout=settings['dropout'])
attend = _Attention(max_length, nr_hidden, dropout=settings['dropout'])
align = _SoftAlignment(max_length, nr_hidden)
compare = _Comparison(max_length, nr_hidden, dropout=settings['dropout'])
entail = _Entailment(nr_hidden, nr_class, dropout=settings['dropout'])
input1 = layers.Input(shape=(max_length,), dtype='int32', name='words1')
input2 = layers.Input(shape=(max_length,), dtype='int32', name='words2')
# Declare the model as a computational graph.
sent1 = embed(ids1) # Shape: (i, n)
sent2 = embed(ids2) # Shape: (j, n)
# embeddings (projected)
embed = create_embedding(vectors, max_length, nr_hidden)
if settings['gru_encode']:
sent1 = encode(sent1)
sent2 = encode(sent2)
a = embed(input1)
b = embed(input2)
attention = attend(sent1, sent2) # Shape: (i, j)
# step 1: attend
F = create_feedforward(nr_hidden)
att_weights = layers.dot([F(a), F(b)], axes=-1)
align1 = align(sent2, attention)
align2 = align(sent1, attention, transpose=True)
G = create_feedforward(nr_hidden)
feats1 = compare(sent1, align1)
feats2 = compare(sent2, align2)
if settings['entail_dir'] == 'both':
norm_weights_a = layers.Lambda(normalizer(1))(att_weights)
norm_weights_b = layers.Lambda(normalizer(2))(att_weights)
alpha = layers.dot([norm_weights_a, a], axes=1)
beta = layers.dot([norm_weights_b, b], axes=1)
scores = entail(feats1, feats2)
# step 2: compare
comp1 = layers.concatenate([a, beta])
comp2 = layers.concatenate([b, alpha])
v1 = layers.TimeDistributed(G)(comp1)
v2 = layers.TimeDistributed(G)(comp2)
# Now that we have the input/output, we can construct the Model object...
model = Model(input=[ids1, ids2], output=[scores])
# step 3: aggregate
v1_sum = layers.Lambda(sum_word)(v1)
v2_sum = layers.Lambda(sum_word)(v2)
concat = layers.concatenate([v1_sum, v2_sum])
elif settings['entail_dir'] == 'left':
norm_weights_a = layers.Lambda(normalizer(1))(att_weights)
alpha = layers.dot([norm_weights_a, a], axes=1)
comp2 = layers.concatenate([b, alpha])
v2 = layers.TimeDistributed(G)(comp2)
v2_sum = layers.Lambda(sum_word)(v2)
concat = v2_sum
else:
norm_weights_b = layers.Lambda(normalizer(2))(att_weights)
beta = layers.dot([norm_weights_b, b], axes=1)
comp1 = layers.concatenate([a, beta])
v1 = layers.TimeDistributed(G)(comp1)
v1_sum = layers.Lambda(sum_word)(v1)
concat = v1_sum
H = create_feedforward(nr_hidden)
out = H(concat)
out = layers.Dense(nr_class, activation='softmax')(out)
model = Model([input1, input2], out)
# ...Compile it...
model.compile(
optimizer=Adam(lr=settings['lr']),
optimizer=optimizers.Adam(lr=settings['lr']),
loss='categorical_crossentropy',
metrics=['accuracy'])
# ...And return it for training.
return model
class _StaticEmbedding(object):
def __init__(self, vectors, max_length, nr_out, nr_tune=1000, dropout=0.0):
self.nr_out = nr_out
self.max_length = max_length
self.embed = Embedding(
def create_embedding(vectors, max_length, projected_dim):
return models.Sequential([
layers.Embedding(
vectors.shape[0],
vectors.shape[1],
input_length=max_length,
weights=[vectors],
name='embed',
trainable=False)
self.tune = Embedding(
nr_tune,
nr_out,
input_length=max_length,
weights=None,
name='tune',
trainable=True,
dropout=dropout)
self.mod_ids = Lambda(lambda sent: sent % (nr_tune-1)+1,
output_shape=(self.max_length,))
trainable=False),
self.project = TimeDistributed(
Dense(
nr_out,
layers.TimeDistributed(
layers.Dense(projected_dim,
activation=None,
bias=False,
name='project'))
use_bias=False))
])
def __call__(self, sentence):
def get_output_shape(shapes):
print(shapes)
return shapes[0]
mod_sent = self.mod_ids(sentence)
tuning = self.tune(mod_sent)
#tuning = merge([tuning, mod_sent],
# mode=lambda AB: AB[0] * (K.clip(K.cast(AB[1], 'float32'), 0, 1)),
# output_shape=(self.max_length, self.nr_out))
pretrained = self.project(self.embed(sentence))
vectors = merge([pretrained, tuning], mode='sum')
return vectors
def create_feedforward(num_units=200, activation='relu', dropout_rate=0.2):
return models.Sequential([
layers.Dense(num_units, activation=activation),
layers.Dropout(dropout_rate),
layers.Dense(num_units, activation=activation),
layers.Dropout(dropout_rate)
])
class _BiRNNEncoding(object):
def __init__(self, max_length, nr_out, dropout=0.0):
self.model = Sequential()
self.model.add(Bidirectional(LSTM(nr_out, return_sequences=True,
dropout_W=dropout, dropout_U=dropout),
input_shape=(max_length, nr_out)))
self.model.add(TimeDistributed(Dense(nr_out, activation='relu', init='he_normal')))
self.model.add(TimeDistributed(Dropout(0.2)))
def normalizer(axis):
def _normalize(att_weights):
exp_weights = K.exp(att_weights)
sum_weights = K.sum(exp_weights, axis=axis, keepdims=True)
return exp_weights/sum_weights
return _normalize
def __call__(self, sentence):
return self.model(sentence)
class _Attention(object):
def __init__(self, max_length, nr_hidden, dropout=0.0, L2=0.0, activation='relu'):
self.max_length = max_length
self.model = Sequential()
self.model.add(Dropout(dropout, input_shape=(nr_hidden,)))
self.model.add(
Dense(nr_hidden, name='attend1',
init='he_normal', W_regularizer=l2(L2),
input_shape=(nr_hidden,), activation='relu'))
self.model.add(Dropout(dropout))
self.model.add(Dense(nr_hidden, name='attend2',
init='he_normal', W_regularizer=l2(L2), activation='relu'))
self.model = TimeDistributed(self.model)
def __call__(self, sent1, sent2):
def _outer(AB):
att_ji = K.batch_dot(AB[1], K.permute_dimensions(AB[0], (0, 2, 1)))
return K.permute_dimensions(att_ji,(0, 2, 1))
return merge(
[self.model(sent1), self.model(sent2)],
mode=_outer,
output_shape=(self.max_length, self.max_length))
class _SoftAlignment(object):
def __init__(self, max_length, nr_hidden):
self.max_length = max_length
self.nr_hidden = nr_hidden
def __call__(self, sentence, attention, transpose=False):
def _normalize_attention(attmat):
att = attmat[0]
mat = attmat[1]
if transpose:
att = K.permute_dimensions(att,(0, 2, 1))
# 3d softmax
e = K.exp(att - K.max(att, axis=-1, keepdims=True))
s = K.sum(e, axis=-1, keepdims=True)
sm_att = e / s
return K.batch_dot(sm_att, mat)
return merge([attention, sentence], mode=_normalize_attention,
output_shape=(self.max_length, self.nr_hidden)) # Shape: (i, n)
class _Comparison(object):
def __init__(self, words, nr_hidden, L2=0.0, dropout=0.0):
self.words = words
self.model = Sequential()
self.model.add(Dropout(dropout, input_shape=(nr_hidden*2,)))
self.model.add(Dense(nr_hidden, name='compare1',
init='he_normal', W_regularizer=l2(L2)))
self.model.add(Activation('relu'))
self.model.add(Dropout(dropout))
self.model.add(Dense(nr_hidden, name='compare2',
W_regularizer=l2(L2), init='he_normal'))
self.model.add(Activation('relu'))
self.model = TimeDistributed(self.model)
def __call__(self, sent, align, **kwargs):
result = self.model(merge([sent, align], mode='concat')) # Shape: (i, n)
avged = GlobalAveragePooling1D()(result, mask=self.words)
maxed = GlobalMaxPooling1D()(result, mask=self.words)
merged = merge([avged, maxed])
result = BatchNormalization()(merged)
return result
class _Entailment(object):
def __init__(self, nr_hidden, nr_out, dropout=0.0, L2=0.0):
self.model = Sequential()
self.model.add(Dropout(dropout, input_shape=(nr_hidden*2,)))
self.model.add(Dense(nr_hidden, name='entail1',
init='he_normal', W_regularizer=l2(L2)))
self.model.add(Activation('relu'))
self.model.add(Dropout(dropout))
self.model.add(Dense(nr_hidden, name='entail2',
init='he_normal', W_regularizer=l2(L2)))
self.model.add(Activation('relu'))
self.model.add(Dense(nr_out, name='entail_out', activation='softmax',
W_regularizer=l2(L2), init='zero'))
def __call__(self, feats1, feats2):
features = merge([feats1, feats2], mode='concat')
return self.model(features)
class _GlobalSumPooling1D(Layer):
'''Global sum pooling operation for temporal data.
# Input shape
3D tensor with shape: `(samples, steps, features)`.
# Output shape
2D tensor with shape: `(samples, features)`.
'''
def __init__(self, **kwargs):
super(_GlobalSumPooling1D, self).__init__(**kwargs)
self.input_spec = [InputSpec(ndim=3)]
def get_output_shape_for(self, input_shape):
return (input_shape[0], input_shape[2])
def call(self, x, mask=None):
if mask is not None:
return K.sum(x * K.clip(mask, 0, 1), axis=1)
else:
def sum_word(x):
return K.sum(x, axis=1)
def test_build_model():
vectors = numpy.ndarray((100, 8), dtype='float32')
vectors = np.ndarray((100, 8), dtype='float32')
shape = (10, 16, 3)
settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True}
settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True, 'entail_dir':'both'}
model = build_model(vectors, shape, settings)
def test_fit_model():
def _generate_X(nr_example, length, nr_vector):
X1 = numpy.ndarray((nr_example, length), dtype='int32')
X1 = np.ndarray((nr_example, length), dtype='int32')
X1 *= X1 < nr_vector
X1 *= 0 <= X1
X2 = numpy.ndarray((nr_example, length), dtype='int32')
X2 = np.ndarray((nr_example, length), dtype='int32')
X2 *= X2 < nr_vector
X2 *= 0 <= X2
return [X1, X2]
def _generate_Y(nr_example, nr_class):
ys = numpy.zeros((nr_example, nr_class), dtype='int32')
ys = np.zeros((nr_example, nr_class), dtype='int32')
for i in range(nr_example):
ys[i, i % nr_class] = 1
return ys
vectors = numpy.ndarray((100, 8), dtype='float32')
vectors = np.ndarray((100, 8), dtype='float32')
shape = (10, 16, 3)
settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True}
settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True, 'entail_dir':'both'}
model = build_model(vectors, shape, settings)
train_X = _generate_X(20, shape[0], vectors.shape[0])
@ -261,8 +139,7 @@ def test_fit_model():
dev_X = _generate_X(15, shape[0], vectors.shape[0])
dev_Y = _generate_Y(15, shape[2])
model.fit(train_X, train_Y, validation_data=(dev_X, dev_Y), nb_epoch=5,
batch_size=4)
model.fit(train_X, train_Y, validation_data=(dev_X, dev_Y), epochs=5, batch_size=4)
__all__ = [build_model]

View File

@ -1,8 +1,5 @@
import numpy as np
from keras.models import model_from_json
import numpy
import numpy.random
import json
from spacy.tokens.span import Span
try:
import cPickle as pickle
@ -11,16 +8,23 @@ except ImportError:
class KerasSimilarityShim(object):
entailment_types = ["entailment", "contradiction", "neutral"]
@classmethod
def load(cls, path, nlp, get_features=None, max_length=100):
def load(cls, path, nlp, max_length=100, get_features=None):
if get_features is None:
get_features = get_word_ids
with (path / 'config.json').open() as file_:
model = model_from_json(file_.read())
with (path / 'model').open('rb') as file_:
weights = pickle.load(file_)
embeddings = get_embeddings(nlp.vocab)
model.set_weights([embeddings] + weights)
weights.insert(1, embeddings)
model.set_weights(weights)
return cls(model, get_features=get_features, max_length=max_length)
def __init__(self, model, get_features=None, max_length=100):
@ -32,58 +36,42 @@ class KerasSimilarityShim(object):
doc.user_hooks['similarity'] = self.predict
doc.user_span_hooks['similarity'] = self.predict
return doc
def predict(self, doc1, doc2):
x1 = self.get_features([doc1], max_length=self.max_length, tree_truncate=True)
x2 = self.get_features([doc2], max_length=self.max_length, tree_truncate=True)
x1 = self.get_features([doc1], max_length=self.max_length)
x2 = self.get_features([doc2], max_length=self.max_length)
scores = self.model.predict([x1, x2])
return scores[0]
return self.entailment_types[scores.argmax()], scores.max()
def get_embeddings(vocab, nr_unk=100):
nr_vector = max(lex.rank for lex in vocab) + 1
vectors = numpy.zeros((nr_vector+nr_unk+2, vocab.vectors_length), dtype='float32')
# the extra +1 is for a zero vector representing sentence-final padding
num_vectors = max(lex.rank for lex in vocab) + 2
# create random vectors for OOV tokens
oov = np.random.normal(size=(nr_unk, vocab.vectors_length))
oov = oov / oov.sum(axis=1, keepdims=True)
vectors = np.zeros((num_vectors + nr_unk, vocab.vectors_length), dtype='float32')
vectors[1:(nr_unk + 1), ] = oov
for lex in vocab:
if lex.has_vector:
vectors[lex.rank+1] = lex.vector / lex.vector_norm
if lex.has_vector and lex.vector_norm > 0:
vectors[nr_unk + lex.rank + 1] = lex.vector / lex.vector_norm
return vectors
def get_word_ids(docs, rnn_encode=False, tree_truncate=False, max_length=100, nr_unk=100):
Xs = numpy.zeros((len(docs), max_length), dtype='int32')
def get_word_ids(docs, max_length=100, nr_unk=100):
Xs = np.zeros((len(docs), max_length), dtype='int32')
for i, doc in enumerate(docs):
if tree_truncate:
if isinstance(doc, Span):
queue = [doc.root]
else:
queue = [sent.root for sent in doc.sents]
else:
queue = list(doc)
words = []
while len(words) <= max_length and queue:
word = queue.pop(0)
if rnn_encode or (not word.is_punct and not word.is_space):
words.append(word)
if tree_truncate:
queue.extend(list(word.lefts))
queue.extend(list(word.rights))
words.sort()
for j, token in enumerate(words):
if token.has_vector:
Xs[i, j] = token.rank+1
else:
Xs[i, j] = (token.shape % (nr_unk-1))+2
j += 1
if j >= max_length:
for j, token in enumerate(doc):
if j == max_length:
break
if token.has_vector:
Xs[i, j] = token.rank + nr_unk + 1
else:
Xs[i, len(words)] = 1
Xs[i, j] = token.rank % nr_unk + 1
return Xs
def create_similarity_pipeline(nlp, max_length=100):
return [
nlp.tagger,
nlp.entity,
nlp.parser,
KerasSimilarityShim.load(nlp.path / 'similarity', nlp, max_length)
]

View File

@ -0,0 +1,955 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Natural language inference using spaCy and Keras"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook details an implementation of the natural language inference model presented in [(Parikh et al, 2016)](https://arxiv.org/abs/1606.01933). The model is notable for the small number of paramaters *and hyperparameters* it specifices, while still yielding good performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Constructing the dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import spacy\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We only need the GloVe vectors from spaCy, not a full NLP pipeline."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"nlp = spacy.load('en_vectors_web_lg')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Function to load the SNLI dataset. The categories are converted to one-shot representation. The function comes from an example in spaCy."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jds/tensorflow-gpu/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
" from ._conv import register_converters as _register_converters\n",
"Using TensorFlow backend.\n"
]
}
],
"source": [
"import ujson as json\n",
"from keras.utils import to_categorical\n",
"\n",
"LABELS = {'entailment': 0, 'contradiction': 1, 'neutral': 2}\n",
"def read_snli(path):\n",
" texts1 = []\n",
" texts2 = []\n",
" labels = []\n",
" with open(path, 'r') as file_:\n",
" for line in file_:\n",
" eg = json.loads(line)\n",
" label = eg['gold_label']\n",
" if label == '-': # per Parikh, ignore - SNLI entries\n",
" continue\n",
" texts1.append(eg['sentence1'])\n",
" texts2.append(eg['sentence2'])\n",
" labels.append(LABELS[label])\n",
" return texts1, texts2, to_categorical(np.asarray(labels, dtype='int32'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because Keras can do the train/test split for us, we'll load *all* SNLI triples from one file."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"texts,hypotheses,labels = read_snli('snli/snli_1.0_train.jsonl')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def create_dataset(nlp, texts, hypotheses, num_oov, max_length, norm_vectors = True):\n",
" sents = texts + hypotheses\n",
" \n",
" # the extra +1 is for a zero vector represting NULL for padding\n",
" num_vectors = max(lex.rank for lex in nlp.vocab) + 2 \n",
" \n",
" # create random vectors for OOV tokens\n",
" oov = np.random.normal(size=(num_oov, nlp.vocab.vectors_length))\n",
" oov = oov / oov.sum(axis=1, keepdims=True)\n",
" \n",
" vectors = np.zeros((num_vectors + num_oov, nlp.vocab.vectors_length), dtype='float32')\n",
" vectors[num_vectors:, ] = oov\n",
" for lex in nlp.vocab:\n",
" if lex.has_vector and lex.vector_norm > 0:\n",
" vectors[lex.rank + 1] = lex.vector / lex.vector_norm if norm_vectors == True else lex.vector\n",
" \n",
" sents_as_ids = []\n",
" for sent in sents:\n",
" doc = nlp(sent)\n",
" word_ids = []\n",
" \n",
" for i, token in enumerate(doc):\n",
" # skip odd spaces from tokenizer\n",
" if token.has_vector and token.vector_norm == 0:\n",
" continue\n",
" \n",
" if i > max_length:\n",
" break\n",
" \n",
" if token.has_vector:\n",
" word_ids.append(token.rank + 1)\n",
" else:\n",
" # if we don't have a vector, pick an OOV entry\n",
" word_ids.append(token.rank % num_oov + num_vectors) \n",
" \n",
" # there must be a simpler way of generating padded arrays from lists...\n",
" word_id_vec = np.zeros((max_length), dtype='int')\n",
" clipped_len = min(max_length, len(word_ids))\n",
" word_id_vec[:clipped_len] = word_ids[:clipped_len]\n",
" sents_as_ids.append(word_id_vec)\n",
" \n",
" \n",
" return vectors, np.array(sents_as_ids[:len(texts)]), np.array(sents_as_ids[len(texts):])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"sem_vectors, text_vectors, hypothesis_vectors = create_dataset(nlp, texts, hypotheses, 100, 50, True)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"texts_test,hypotheses_test,labels_test = read_snli('snli/snli_1.0_test.jsonl')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"_, text_vectors_test, hypothesis_vectors_test = create_dataset(nlp, texts_test, hypotheses_test, 100, 50, True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use spaCy to tokenize the sentences and return, when available, a semantic vector for each token. \n",
"\n",
"OOV terms (tokens for which no semantic vector is available) are assigned to one of a set of randomly-generated OOV vectors, per (Parikh et al, 2016).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we will clip sentences to 50 words maximum."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"from keras import layers, Model, models\n",
"from keras import backend as K"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The embedding layer copies the 300-dimensional GloVe vectors into GPU memory. Per (Parikh et al, 2016), the vectors, which are not adapted during training, are projected down to lower-dimensional vectors using a trained projection matrix."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def create_embedding(vectors, max_length, projected_dim):\n",
" return models.Sequential([\n",
" layers.Embedding(\n",
" vectors.shape[0],\n",
" vectors.shape[1],\n",
" input_length=max_length,\n",
" weights=[vectors],\n",
" trainable=False),\n",
" \n",
" layers.TimeDistributed(\n",
" layers.Dense(projected_dim,\n",
" activation=None,\n",
" use_bias=False))\n",
" ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Parikh model makes use of three feedforward blocks that construct nonlinear combinations of their input. Each block contains two ReLU layers and two dropout layers."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def create_feedforward(num_units=200, activation='relu', dropout_rate=0.2):\n",
" return models.Sequential([\n",
" layers.Dense(num_units, activation=activation),\n",
" layers.Dropout(dropout_rate),\n",
" layers.Dense(num_units, activation=activation),\n",
" layers.Dropout(dropout_rate)\n",
" ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The basic idea of the (Parikh et al, 2016) model is to:\n",
"\n",
"1. *Align*: Construct an alignment of subphrases in the text and hypothesis using an attention-like mechanism, called \"decompositional\" because the layer is applied to each of the two sentences individually rather than to their product. The dot product of the nonlinear transformations of the inputs is then normalized vertically and horizontally to yield a pair of \"soft\" alignment structures, from text->hypothesis and hypothesis->text. Concretely, for each word in one sentence, a multinomial distribution is computed over the words of the other sentence, by learning a multinomial logistic with softmax target.\n",
"2. *Compare*: Each word is now compared to its aligned phrase using a function modeled as a two-layer feedforward ReLU network. The output is a high-dimensional representation of the strength of association between word and aligned phrase.\n",
"3. *Aggregate*: The comparison vectors are summed, separately, for the text and the hypothesis. The result is two vectors: one that describes the degree of association of the text to the hypothesis, and the second, of the hypothesis to the text.\n",
"4. Finally, these two vectors are processed by a dense layer followed by a softmax classifier, as usual.\n",
"\n",
"Note that because in entailment the truth conditions of the consequent must be a subset of those of the antecedent, it is not obvious that we need both vectors in step (3). Entailment is not symmetric. It may be enough to just use the hypothesis->text vector. We will explore this possibility later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need a couple of little functions for Lambda layers to normalize and aggregate weights:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def normalizer(axis):\n",
" def _normalize(att_weights):\n",
" exp_weights = K.exp(att_weights)\n",
" sum_weights = K.sum(exp_weights, axis=axis, keepdims=True)\n",
" return exp_weights/sum_weights\n",
" return _normalize\n",
"\n",
"def sum_word(x):\n",
" return K.sum(x, axis=1)\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"def build_model(vectors, max_length, num_hidden, num_classes, projected_dim, entail_dir='both'):\n",
" input1 = layers.Input(shape=(max_length,), dtype='int32', name='words1')\n",
" input2 = layers.Input(shape=(max_length,), dtype='int32', name='words2')\n",
" \n",
" # embeddings (projected)\n",
" embed = create_embedding(vectors, max_length, projected_dim)\n",
" \n",
" a = embed(input1)\n",
" b = embed(input2)\n",
" \n",
" # step 1: attend\n",
" F = create_feedforward(num_hidden)\n",
" att_weights = layers.dot([F(a), F(b)], axes=-1)\n",
" \n",
" G = create_feedforward(num_hidden)\n",
" \n",
" if entail_dir == 'both':\n",
" norm_weights_a = layers.Lambda(normalizer(1))(att_weights)\n",
" norm_weights_b = layers.Lambda(normalizer(2))(att_weights)\n",
" alpha = layers.dot([norm_weights_a, a], axes=1)\n",
" beta = layers.dot([norm_weights_b, b], axes=1)\n",
"\n",
" # step 2: compare\n",
" comp1 = layers.concatenate([a, beta])\n",
" comp2 = layers.concatenate([b, alpha])\n",
" v1 = layers.TimeDistributed(G)(comp1)\n",
" v2 = layers.TimeDistributed(G)(comp2)\n",
"\n",
" # step 3: aggregate\n",
" v1_sum = layers.Lambda(sum_word)(v1)\n",
" v2_sum = layers.Lambda(sum_word)(v2)\n",
" concat = layers.concatenate([v1_sum, v2_sum])\n",
" elif entail_dir == 'left':\n",
" norm_weights_a = layers.Lambda(normalizer(1))(att_weights)\n",
" alpha = layers.dot([norm_weights_a, a], axes=1)\n",
" comp2 = layers.concatenate([b, alpha])\n",
" v2 = layers.TimeDistributed(G)(comp2)\n",
" v2_sum = layers.Lambda(sum_word)(v2)\n",
" concat = v2_sum\n",
" else:\n",
" norm_weights_b = layers.Lambda(normalizer(2))(att_weights)\n",
" beta = layers.dot([norm_weights_b, b], axes=1)\n",
" comp1 = layers.concatenate([a, beta])\n",
" v1 = layers.TimeDistributed(G)(comp1)\n",
" v1_sum = layers.Lambda(sum_word)(v1)\n",
" concat = v1_sum\n",
" \n",
" H = create_feedforward(num_hidden)\n",
" out = H(concat)\n",
" out = layers.Dense(num_classes, activation='softmax')(out)\n",
" \n",
" model = Model([input1, input2], out)\n",
" \n",
" model.compile(optimizer='adam',\n",
" loss='categorical_crossentropy',\n",
" metrics=['accuracy'])\n",
" return model\n",
" \n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"words1 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"words2 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"sequential_1 (Sequential) (None, 50, 200) 321381600 words1[0][0] \n",
" words2[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_2 (Sequential) (None, 50, 200) 80400 sequential_1[1][0] \n",
" sequential_1[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_1 (Dot) (None, 50, 50) 0 sequential_2[1][0] \n",
" sequential_2[2][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_2 (Lambda) (None, 50, 50) 0 dot_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_1 (Lambda) (None, 50, 50) 0 dot_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"dot_3 (Dot) (None, 50, 200) 0 lambda_2[0][0] \n",
" sequential_1[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_2 (Dot) (None, 50, 200) 0 lambda_1[0][0] \n",
" sequential_1[1][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_1 (Concatenate) (None, 50, 400) 0 sequential_1[1][0] \n",
" dot_3[0][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_2 (Concatenate) (None, 50, 400) 0 sequential_1[2][0] \n",
" dot_2[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_2 (TimeDistrib (None, 50, 200) 120400 concatenate_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_3 (TimeDistrib (None, 50, 200) 120400 concatenate_2[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_3 (Lambda) (None, 200) 0 time_distributed_2[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_4 (Lambda) (None, 200) 0 time_distributed_3[0][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_3 (Concatenate) (None, 400) 0 lambda_3[0][0] \n",
" lambda_4[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_4 (Sequential) (None, 200) 120400 concatenate_3[0][0] \n",
"__________________________________________________________________________________________________\n",
"dense_8 (Dense) (None, 3) 603 sequential_4[1][0] \n",
"==================================================================================================\n",
"Total params: 321,703,403\n",
"Trainable params: 381,803\n",
"Non-trainable params: 321,321,600\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"K.clear_session()\n",
"m = build_model(sem_vectors, 50, 200, 3, 200)\n",
"m.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The number of trainable parameters, ~381k, is the number given by Parikh et al, so we're on the right track."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parikh et al use tiny batches of 4, training for 50MM batches, which amounts to around 500 epochs. Here we'll use large batches to better use the GPU, and train for fewer epochs -- for purposes of this experiment."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 549367 samples, validate on 9824 samples\n",
"Epoch 1/50\n",
"549367/549367 [==============================] - 34s 62us/step - loss: 0.7599 - acc: 0.6617 - val_loss: 0.5396 - val_acc: 0.7861\n",
"Epoch 2/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.5611 - acc: 0.7763 - val_loss: 0.4892 - val_acc: 0.8085\n",
"Epoch 3/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.5212 - acc: 0.7948 - val_loss: 0.4574 - val_acc: 0.8261\n",
"Epoch 4/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4986 - acc: 0.8045 - val_loss: 0.4410 - val_acc: 0.8274\n",
"Epoch 5/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4819 - acc: 0.8114 - val_loss: 0.4224 - val_acc: 0.8383\n",
"Epoch 6/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4714 - acc: 0.8166 - val_loss: 0.4200 - val_acc: 0.8379\n",
"Epoch 7/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4633 - acc: 0.8203 - val_loss: 0.4098 - val_acc: 0.8457\n",
"Epoch 8/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4558 - acc: 0.8232 - val_loss: 0.4114 - val_acc: 0.8415\n",
"Epoch 9/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4508 - acc: 0.8250 - val_loss: 0.4062 - val_acc: 0.8477\n",
"Epoch 10/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4433 - acc: 0.8286 - val_loss: 0.3982 - val_acc: 0.8486\n",
"Epoch 11/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4388 - acc: 0.8307 - val_loss: 0.3953 - val_acc: 0.8497\n",
"Epoch 12/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4351 - acc: 0.8321 - val_loss: 0.3973 - val_acc: 0.8522\n",
"Epoch 13/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4309 - acc: 0.8342 - val_loss: 0.3939 - val_acc: 0.8539\n",
"Epoch 14/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4269 - acc: 0.8355 - val_loss: 0.3932 - val_acc: 0.8517\n",
"Epoch 15/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4247 - acc: 0.8369 - val_loss: 0.3938 - val_acc: 0.8515\n",
"Epoch 16/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4208 - acc: 0.8379 - val_loss: 0.3936 - val_acc: 0.8504\n",
"Epoch 17/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4194 - acc: 0.8390 - val_loss: 0.3885 - val_acc: 0.8560\n",
"Epoch 18/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4162 - acc: 0.8402 - val_loss: 0.3874 - val_acc: 0.8561\n",
"Epoch 19/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4140 - acc: 0.8409 - val_loss: 0.3889 - val_acc: 0.8545\n",
"Epoch 20/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4114 - acc: 0.8426 - val_loss: 0.3864 - val_acc: 0.8583\n",
"Epoch 21/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4092 - acc: 0.8430 - val_loss: 0.3870 - val_acc: 0.8561\n",
"Epoch 22/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4062 - acc: 0.8442 - val_loss: 0.3852 - val_acc: 0.8577\n",
"Epoch 23/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4050 - acc: 0.8450 - val_loss: 0.3850 - val_acc: 0.8578\n",
"Epoch 24/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4035 - acc: 0.8455 - val_loss: 0.3825 - val_acc: 0.8555\n",
"Epoch 25/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.4018 - acc: 0.8460 - val_loss: 0.3837 - val_acc: 0.8573\n",
"Epoch 26/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3989 - acc: 0.8476 - val_loss: 0.3843 - val_acc: 0.8599\n",
"Epoch 27/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3979 - acc: 0.8481 - val_loss: 0.3841 - val_acc: 0.8589\n",
"Epoch 28/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3967 - acc: 0.8484 - val_loss: 0.3811 - val_acc: 0.8575\n",
"Epoch 29/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3956 - acc: 0.8492 - val_loss: 0.3829 - val_acc: 0.8589\n",
"Epoch 30/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3938 - acc: 0.8499 - val_loss: 0.3859 - val_acc: 0.8562\n",
"Epoch 31/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3925 - acc: 0.8500 - val_loss: 0.3798 - val_acc: 0.8587\n",
"Epoch 32/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3906 - acc: 0.8509 - val_loss: 0.3834 - val_acc: 0.8569\n",
"Epoch 33/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3893 - acc: 0.8511 - val_loss: 0.3806 - val_acc: 0.8588\n",
"Epoch 34/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3885 - acc: 0.8515 - val_loss: 0.3828 - val_acc: 0.8603\n",
"Epoch 35/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3879 - acc: 0.8520 - val_loss: 0.3800 - val_acc: 0.8594\n",
"Epoch 36/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3860 - acc: 0.8530 - val_loss: 0.3796 - val_acc: 0.8577\n",
"Epoch 37/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3856 - acc: 0.8532 - val_loss: 0.3857 - val_acc: 0.8591\n",
"Epoch 38/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3838 - acc: 0.8535 - val_loss: 0.3835 - val_acc: 0.8603\n",
"Epoch 39/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3830 - acc: 0.8543 - val_loss: 0.3830 - val_acc: 0.8599\n",
"Epoch 40/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3818 - acc: 0.8548 - val_loss: 0.3832 - val_acc: 0.8559\n",
"Epoch 41/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3806 - acc: 0.8551 - val_loss: 0.3845 - val_acc: 0.8553\n",
"Epoch 42/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3803 - acc: 0.8550 - val_loss: 0.3789 - val_acc: 0.8617\n",
"Epoch 43/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3791 - acc: 0.8556 - val_loss: 0.3835 - val_acc: 0.8580\n",
"Epoch 44/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3778 - acc: 0.8565 - val_loss: 0.3799 - val_acc: 0.8580\n",
"Epoch 45/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3766 - acc: 0.8571 - val_loss: 0.3790 - val_acc: 0.8625\n",
"Epoch 46/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3770 - acc: 0.8569 - val_loss: 0.3820 - val_acc: 0.8590\n",
"Epoch 47/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3761 - acc: 0.8573 - val_loss: 0.3831 - val_acc: 0.8581\n",
"Epoch 48/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3739 - acc: 0.8579 - val_loss: 0.3828 - val_acc: 0.8599\n",
"Epoch 49/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3738 - acc: 0.8577 - val_loss: 0.3785 - val_acc: 0.8590\n",
"Epoch 50/50\n",
"549367/549367 [==============================] - 33s 60us/step - loss: 0.3726 - acc: 0.8580 - val_loss: 0.3820 - val_acc: 0.8585\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7f5c9f49c438>"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=50,validation_data=([text_vectors_test, hypothesis_vectors_test], labels_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is broadly in the region reported by Parikh et al: ~86 vs 86.3%. The small difference might be accounted by differences in `max_length` (here set at 50), in the training regime, and that here we use Keras' built-in validation splitting rather than the SNLI test set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Experiment: the asymmetric model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It was suggested earlier that, based on the semantics of entailment, the vector representing the strength of association between the hypothesis to the text is all that is needed for classifying the entailment.\n",
"\n",
"The following model removes consideration of the complementary vector (text to hypothesis) from the computation. This will decrease the paramater count slightly, because the final dense layers will be smaller, and speed up the forward pass when predicting, because fewer calculations will be needed."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"words2 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"words1 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"sequential_5 (Sequential) (None, 50, 200) 321381600 words1[0][0] \n",
" words2[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_6 (Sequential) (None, 50, 200) 80400 sequential_5[1][0] \n",
" sequential_5[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_4 (Dot) (None, 50, 50) 0 sequential_6[1][0] \n",
" sequential_6[2][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_5 (Lambda) (None, 50, 50) 0 dot_4[0][0] \n",
"__________________________________________________________________________________________________\n",
"dot_5 (Dot) (None, 50, 200) 0 lambda_5[0][0] \n",
" sequential_5[1][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_4 (Concatenate) (None, 50, 400) 0 sequential_5[2][0] \n",
" dot_5[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_5 (TimeDistrib (None, 50, 200) 120400 concatenate_4[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_6 (Lambda) (None, 200) 0 time_distributed_5[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_8 (Sequential) (None, 200) 80400 lambda_6[0][0] \n",
"__________________________________________________________________________________________________\n",
"dense_16 (Dense) (None, 3) 603 sequential_8[1][0] \n",
"==================================================================================================\n",
"Total params: 321,663,403\n",
"Trainable params: 341,803\n",
"Non-trainable params: 321,321,600\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"m1 = build_model(sem_vectors, 50, 200, 3, 200, 'left')\n",
"m1.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The parameter count has indeed decreased by 40,000, corresponding to the 200x200 smaller H function."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 549367 samples, validate on 9824 samples\n",
"Epoch 1/50\n",
"549367/549367 [==============================] - 25s 46us/step - loss: 0.7331 - acc: 0.6770 - val_loss: 0.5257 - val_acc: 0.7936\n",
"Epoch 2/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.5518 - acc: 0.7799 - val_loss: 0.4717 - val_acc: 0.8159\n",
"Epoch 3/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.5147 - acc: 0.7967 - val_loss: 0.4449 - val_acc: 0.8278\n",
"Epoch 4/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4948 - acc: 0.8060 - val_loss: 0.4326 - val_acc: 0.8344\n",
"Epoch 5/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4814 - acc: 0.8122 - val_loss: 0.4247 - val_acc: 0.8359\n",
"Epoch 6/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4712 - acc: 0.8162 - val_loss: 0.4143 - val_acc: 0.8430\n",
"Epoch 7/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4635 - acc: 0.8205 - val_loss: 0.4172 - val_acc: 0.8401\n",
"Epoch 8/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4570 - acc: 0.8223 - val_loss: 0.4106 - val_acc: 0.8422\n",
"Epoch 9/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4505 - acc: 0.8259 - val_loss: 0.4043 - val_acc: 0.8451\n",
"Epoch 10/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4459 - acc: 0.8280 - val_loss: 0.4050 - val_acc: 0.8467\n",
"Epoch 11/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4405 - acc: 0.8300 - val_loss: 0.3975 - val_acc: 0.8481\n",
"Epoch 12/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4360 - acc: 0.8324 - val_loss: 0.4026 - val_acc: 0.8496\n",
"Epoch 13/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4327 - acc: 0.8334 - val_loss: 0.4024 - val_acc: 0.8471\n",
"Epoch 14/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4293 - acc: 0.8350 - val_loss: 0.3955 - val_acc: 0.8496\n",
"Epoch 15/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4263 - acc: 0.8369 - val_loss: 0.3980 - val_acc: 0.8490\n",
"Epoch 16/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4236 - acc: 0.8377 - val_loss: 0.3958 - val_acc: 0.8496\n",
"Epoch 17/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4213 - acc: 0.8384 - val_loss: 0.3954 - val_acc: 0.8496\n",
"Epoch 18/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4187 - acc: 0.8394 - val_loss: 0.3929 - val_acc: 0.8514\n",
"Epoch 19/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4157 - acc: 0.8409 - val_loss: 0.3939 - val_acc: 0.8507\n",
"Epoch 20/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4135 - acc: 0.8417 - val_loss: 0.3953 - val_acc: 0.8522\n",
"Epoch 21/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4122 - acc: 0.8424 - val_loss: 0.3974 - val_acc: 0.8506\n",
"Epoch 22/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4099 - acc: 0.8435 - val_loss: 0.3918 - val_acc: 0.8522\n",
"Epoch 23/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4075 - acc: 0.8443 - val_loss: 0.3901 - val_acc: 0.8513\n",
"Epoch 24/50\n",
"549367/549367 [==============================] - 24s 44us/step - loss: 0.4067 - acc: 0.8447 - val_loss: 0.3885 - val_acc: 0.8543\n",
"Epoch 25/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4047 - acc: 0.8454 - val_loss: 0.3846 - val_acc: 0.8531\n",
"Epoch 26/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.4031 - acc: 0.8461 - val_loss: 0.3864 - val_acc: 0.8562\n",
"Epoch 27/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4020 - acc: 0.8467 - val_loss: 0.3874 - val_acc: 0.8546\n",
"Epoch 28/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.4001 - acc: 0.8473 - val_loss: 0.3848 - val_acc: 0.8534\n",
"Epoch 29/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3991 - acc: 0.8479 - val_loss: 0.3865 - val_acc: 0.8562\n",
"Epoch 30/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3976 - acc: 0.8484 - val_loss: 0.3833 - val_acc: 0.8574\n",
"Epoch 31/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3961 - acc: 0.8487 - val_loss: 0.3846 - val_acc: 0.8585\n",
"Epoch 32/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3942 - acc: 0.8498 - val_loss: 0.3805 - val_acc: 0.8573\n",
"Epoch 33/50\n",
"549367/549367 [==============================] - 24s 44us/step - loss: 0.3935 - acc: 0.8503 - val_loss: 0.3856 - val_acc: 0.8579\n",
"Epoch 34/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3923 - acc: 0.8507 - val_loss: 0.3829 - val_acc: 0.8560\n",
"Epoch 35/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3920 - acc: 0.8508 - val_loss: 0.3864 - val_acc: 0.8575\n",
"Epoch 36/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3907 - acc: 0.8516 - val_loss: 0.3873 - val_acc: 0.8563\n",
"Epoch 37/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3891 - acc: 0.8519 - val_loss: 0.3850 - val_acc: 0.8570\n",
"Epoch 38/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3872 - acc: 0.8522 - val_loss: 0.3815 - val_acc: 0.8591\n",
"Epoch 39/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3887 - acc: 0.8520 - val_loss: 0.3829 - val_acc: 0.8590\n",
"Epoch 40/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3868 - acc: 0.8531 - val_loss: 0.3807 - val_acc: 0.8600\n",
"Epoch 41/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3859 - acc: 0.8537 - val_loss: 0.3832 - val_acc: 0.8574\n",
"Epoch 42/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3849 - acc: 0.8537 - val_loss: 0.3850 - val_acc: 0.8576\n",
"Epoch 43/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3834 - acc: 0.8541 - val_loss: 0.3825 - val_acc: 0.8563\n",
"Epoch 44/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3829 - acc: 0.8548 - val_loss: 0.3844 - val_acc: 0.8540\n",
"Epoch 45/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3816 - acc: 0.8552 - val_loss: 0.3841 - val_acc: 0.8559\n",
"Epoch 46/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3816 - acc: 0.8549 - val_loss: 0.3880 - val_acc: 0.8567\n",
"Epoch 47/50\n",
"549367/549367 [==============================] - 24s 45us/step - loss: 0.3799 - acc: 0.8559 - val_loss: 0.3767 - val_acc: 0.8635\n",
"Epoch 48/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3800 - acc: 0.8560 - val_loss: 0.3786 - val_acc: 0.8563\n",
"Epoch 49/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3781 - acc: 0.8563 - val_loss: 0.3812 - val_acc: 0.8596\n",
"Epoch 50/50\n",
"549367/549367 [==============================] - 25s 45us/step - loss: 0.3788 - acc: 0.8560 - val_loss: 0.3782 - val_acc: 0.8601\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7f5ca1bf3e48>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=50,validation_data=([text_vectors_test, hypothesis_vectors_test], labels_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This model performs the same as the slightly more complex model that evaluates alignments in both directions. Note also that processing time is improved, from 64 down to 48 microseconds per step. \n",
"\n",
"Let's now look at an asymmetric model that evaluates text to hypothesis comparisons. The prediction is that such a model will correctly classify a decent proportion of the exemplars, but not as accurately as the previous two.\n",
"\n",
"We'll just use 10 epochs for expediency."
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"words1 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"words2 (InputLayer) (None, 50) 0 \n",
"__________________________________________________________________________________________________\n",
"sequential_13 (Sequential) (None, 50, 200) 321381600 words1[0][0] \n",
" words2[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_14 (Sequential) (None, 50, 200) 80400 sequential_13[1][0] \n",
" sequential_13[2][0] \n",
"__________________________________________________________________________________________________\n",
"dot_8 (Dot) (None, 50, 50) 0 sequential_14[1][0] \n",
" sequential_14[2][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_9 (Lambda) (None, 50, 50) 0 dot_8[0][0] \n",
"__________________________________________________________________________________________________\n",
"dot_9 (Dot) (None, 50, 200) 0 lambda_9[0][0] \n",
" sequential_13[2][0] \n",
"__________________________________________________________________________________________________\n",
"concatenate_6 (Concatenate) (None, 50, 400) 0 sequential_13[1][0] \n",
" dot_9[0][0] \n",
"__________________________________________________________________________________________________\n",
"time_distributed_9 (TimeDistrib (None, 50, 200) 120400 concatenate_6[0][0] \n",
"__________________________________________________________________________________________________\n",
"lambda_10 (Lambda) (None, 200) 0 time_distributed_9[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_16 (Sequential) (None, 200) 80400 lambda_10[0][0] \n",
"__________________________________________________________________________________________________\n",
"dense_32 (Dense) (None, 3) 603 sequential_16[1][0] \n",
"==================================================================================================\n",
"Total params: 321,663,403\n",
"Trainable params: 341,803\n",
"Non-trainable params: 321,321,600\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"m2 = build_model(sem_vectors, 50, 200, 3, 200, 'right')\n",
"m2.summary()"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 455226 samples, validate on 113807 samples\n",
"Epoch 1/10\n",
"455226/455226 [==============================] - 22s 49us/step - loss: 0.8920 - acc: 0.5771 - val_loss: 0.8001 - val_acc: 0.6435\n",
"Epoch 2/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.7808 - acc: 0.6553 - val_loss: 0.7267 - val_acc: 0.6855\n",
"Epoch 3/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.7329 - acc: 0.6825 - val_loss: 0.6966 - val_acc: 0.7006\n",
"Epoch 4/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.7055 - acc: 0.6978 - val_loss: 0.6713 - val_acc: 0.7150\n",
"Epoch 5/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6862 - acc: 0.7081 - val_loss: 0.6533 - val_acc: 0.7253\n",
"Epoch 6/10\n",
"455226/455226 [==============================] - 21s 47us/step - loss: 0.6694 - acc: 0.7179 - val_loss: 0.6472 - val_acc: 0.7277\n",
"Epoch 7/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6555 - acc: 0.7252 - val_loss: 0.6338 - val_acc: 0.7347\n",
"Epoch 8/10\n",
"455226/455226 [==============================] - 22s 48us/step - loss: 0.6434 - acc: 0.7310 - val_loss: 0.6246 - val_acc: 0.7385\n",
"Epoch 9/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6325 - acc: 0.7367 - val_loss: 0.6164 - val_acc: 0.7424\n",
"Epoch 10/10\n",
"455226/455226 [==============================] - 22s 47us/step - loss: 0.6216 - acc: 0.7426 - val_loss: 0.6082 - val_acc: 0.7478\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x7fa6850cf080>"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m2.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=10,validation_split=.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparing this fit to the validation accuracy of the previous two models after 10 epochs, we observe that its accuracy is roughly 10% lower.\n",
"\n",
"It is reassuring that the neural modeling here reproduces what we know from the semantics of natural language!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -6,8 +6,10 @@ from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS
from .lemmatizer import LOOKUP
from .tag_map import TAG_MAP
from .norm_exceptions import NORM_EXCEPTIONS
from ..tokenizer_exceptions import BASE_EXCEPTIONS
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_PREFIXES
from ..norm_exceptions import BASE_NORMS
from ...language import Language
from ...attrs import LANG, NORM
@ -17,13 +19,14 @@ from ...util import update_exc, add_lookups
class PortugueseDefaults(Language.Defaults):
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
lex_attr_getters[LANG] = lambda text: 'pt'
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS)
lex_attr_getters.update(LEX_ATTRS)
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
stop_words = STOP_WORDS
lemma_lookup = LOOKUP
tag_map = TAG_MAP
infixes = TOKENIZER_INFIXES
prefixes = TOKENIZER_PREFIXES
class Portuguese(Language):
lang = 'pt'

View File

@ -23,7 +23,7 @@ _ordinal_words = ['primeiro', 'segundo', 'terceiro', 'quarto', 'quinto', 'sexto'
def like_num(text):
text = text.replace(',', '').replace('.', '')
text = text.replace(',', '').replace('.', '').replace('º','').replace('ª','')
if text.isdigit():
return True
if text.count('/') == 1:

View File

@ -0,0 +1,23 @@
# coding: utf8
from __future__ import unicode_literals
# These exceptions are used to add NORM values based on a token's ORTH value.
# Individual languages can also add their own exceptions and overwrite them -
# for example, British vs. American spelling in English.
# Norms are only set if no alternative is provided in the tokenizer exceptions.
# Note that this does not change any other token attributes. Its main purpose
# is to normalise the word representations so that equivalent tokens receive
# similar representations. For example: $ and € are very different, but they're
# both currency symbols. By normalising currency symbols to $, all symbols are
# seen as similar, no matter how common they are in the training data.
NORM_EXCEPTIONS = {
"R$": "$", # Real
"r$": "$", # Real
"Cz$": "$", # Cruzado
"cz$": "$", # Cruzado
"NCz$": "$", # Cruzado Novo
"ncz$": "$" # Cruzado Novo
}

View File

@ -0,0 +1,18 @@
# coding: utf8
from __future__ import unicode_literals
from ..punctuation import TOKENIZER_PREFIXES as BASE_TOKENIZER_PREFIXES
from ..punctuation import TOKENIZER_SUFFIXES as BASE_TOKENIZER_SUFFIXES
from ..punctuation import TOKENIZER_INFIXES as BASE_TOKENIZER_INFIXES
_prefixes = ([r'\w{1,3}\$'] + BASE_TOKENIZER_PREFIXES)
_suffixes = (BASE_TOKENIZER_SUFFIXES)
_infixes = ([r'(\w+-\w+(-\w+)*)'] +
BASE_TOKENIZER_INFIXES
)
TOKENIZER_PREFIXES = _prefixes
TOKENIZER_SUFFIXES = _suffixes
TOKENIZER_INFIXES = _infixes

View File

@ -3,67 +3,66 @@ from __future__ import unicode_literals
STOP_WORDS = set("""
à às acerca adeus agora ainda algo algumas alguns ali além ambas ambos ano
anos antes ao aos apenas apoio apoia apontar após aquela aquelas aquele aqueles
aqui aquilo área as assim através atrás até
à às área acerca ademais adeus agora ainda algo algumas alguns ali além ambas ambos antes
ao aos apenas apoia apoio apontar após aquela aquelas aquele aqueles aqui aquilo
as assim através atrás até
baixo bastante bem boa bom breve
cada caminho catorze cedo cento certamente certeza cima cinco coisa com como
comprido comprida conhecida conhecido conselho contra corrente custa
comprida comprido conhecida conhecido conselho contra contudo corrente cuja
cujo custa
da daquela daquele dar das de debaixo demais dentro depois desde desligada
desligado dessa desse desta deste deve devem deverá dez dezanove dezasseis
dezassete dezoito dia diante direita diz dizem dizer do dois dos doze duas
dão dúvida
da daquela daquele dar das de debaixo demais dentro depois des desde dessa desse
desta deste deve devem deverá dez dezanove dezasseis dezassete dezoito diante
direita disso diz dizem dizer do dois dos doze duas dão
é ela elas ele eles em embora enquanto entre então era és essa essas esse esses
esta estado estar estará estas estava este estes esteve estive estivemos
estiveram estiveste estivestes estou está estás estão eu exemplo
é és ela elas ele eles em embora enquanto entre então era essa essas esse esses esta
estado estar estará estas estava este estes esteve estive estivemos estiveram
estiveste estivestes estou está estás estão eu eventual exemplo
falta fará favor faz fazeis fazem fazemos fazer fazes fazia faço fez fim final
foi fomos for fora foram forma foste fostes fui
geral grande grandes grupo
hoje horas
inclusive iniciar inicio ir irá isso isto
iniciar inicio ir irá isso isto
lado ligado local logo longe lugar
lado lhe ligado local logo longe lugar
maior maioria maiorias mais mal mas me meio menor menos meses mesmo meu meus
mil minha minhas momento muito muitos máximo mês
maior maioria maiorias mais mal mas me meio menor menos meses mesmo meu meus mil
minha minhas momento muito muitos máximo mês
na nada naquela naquele nas nem nenhuma nessa nesse nesta neste no noite nome
nos nossa nossas nosso nossos nova novas nove novo novos num numa nunca nuns
não nível nós número números
na nada naquela naquele nas nem nenhuma nessa nesse nesta neste no nos nossa
nossas nosso nossos nova novas nove novo novos num numa nunca nuns não nível nós
número números
obra obrigada obrigado oitava oitavo oito onde ontem onze os ou outra outras
outro outros
obrigada obrigado oitava oitavo oito onde ontem onze ora os ou outra outras outros
para parece parte partir pegar pela pelas pelo pelos perto pessoas pode podem
poder poderá podia ponto pontos por porque porquê posição possivelmente posso
possível pouca pouco povo primeira primeiro próprio próxima próximo puderam pôde
põe põem
para parece parte partir pegar pela pelas pelo pelos perto pode podem poder poderá
podia pois ponto pontos por porquanto porque porquê portanto porém posição
possivelmente posso possível pouca pouco povo primeira primeiro próprio próxima
próximo puderam pôde põe põem
qual qualquer quando quanto quarta quarto quatro que quem quer querem quero
quais qual qualquer quando quanto quarta quarto quatro que quem quer querem quero
questão quieta quieto quinta quinto quinze quê
relação
sabe saber se segunda segundo sei seis sem sempre ser seria sete seu seus sexta
sexto sim sistema sob sobre sois somente somos sou sua suas são sétima sétimo
sexto sim sistema sob sobre sois somente somos sou sua suas são sétima sétimo
tal talvez também tanta tanto tarde te tem temos tempo tendes tenho tens tentar
tentaram tente tentei ter terceira terceiro teu teus teve tipo tive tivemos
tiveram tiveste tivestes toda todas todo todos trabalhar trabalho treze três tu
tua tuas tudo tão têm
tais tal talvez também tanta tanto tarde te tem temos tempo tendes tenho tens
tentar tentaram tente tentei ter terceira terceiro teu teus teve tipo tive
tivemos tiveram tiveste tivestes toda todas todo todos treze três tu tua tuas
tudo tão têm
último um uma umas uns usa usar
um uma umas uns usa usar último
vai vais valor veja vem vens ver verdade verdadeira verdadeiro vez vezes viagem
vinda vindo vinte você vocês vos vossa vossas vosso vossos vários vão vêm vós
vai vais valor veja vem vens ver vez vezes vinda vindo vinte você vocês vos vossa
vossas vosso vossos vários vão vêm vós
zero
""".split())

View File

@ -67,7 +67,7 @@ for orth in _per_pron + _dem_pron + _und_pron:
for orth in [
"Adm.", "Dr.", "e.g.", "E.g.", "E.G.", "Gen.", "Gov.", "i.e.", "I.e.",
"I.E.", "Jr.", "Ltd.", "p.m.", "Ph.D.", "Rep.", "Rev.", "Sen.", "Sr.",
"Sra.", "vs."]:
"Sra.", "vs.", "tel.", "pág.", "pag."]:
_exc[orth] = [{ORTH: orth}]

View File

@ -6,7 +6,7 @@ p
| but somewhat ugly in Python. Logic that deals with Python or platform
| compatibility only lives in #[code spacy.compat]. To distinguish them from
| the builtin functions, replacement functions are suffixed with an
| undersocre, e.e #[code unicode_].
| underscore, e.e #[code unicode_].
+aside-code("Example").
from spacy.compat import unicode_, json_dumps

View File

@ -184,7 +184,7 @@
"from spacy_lookup import Entity",
"",
"nlp = spacy.load('en')",
"entity = Entity(nlp, keywords_list=['python', 'java platform'])",
"entity = Entity(keywords_list=['python', 'java platform'])",
"nlp.add_pipe(entity, last=True)",
"",
"doc = nlp(u\"I am a product manager for a java and python.\")",