2016-11-01 04:13:54 +03:00
< a href = "https://explosion.ai" > < img src = "https://explosion.ai/assets/img/logo.svg" width = "125" height = "125" align = "right" / > < / a >
2016-11-01 05:19:15 +03:00
# A decomposable attention model for Natural Language Inference
2016-11-01 04:13:54 +03:00
**by Matthew Honnibal, [@honnibal ](https://github.com/honnibal )**
2016-11-01 03:51:54 +03:00
2016-11-01 05:30:10 +03:00
This directory contains an implementation of the entailment prediction model described
2016-11-01 04:13:54 +03:00
by [Parikh et al. (2016) ](https://arxiv.org/pdf/1606.01933.pdf ). The model is notable
for its competitive performance with very few parameters.
The model is implemented using [Keras ](https://keras.io/ ) and [spaCy ](https://spacy.io ).
2016-11-01 05:22:36 +03:00
Keras is used to build and train the network. spaCy is used to load
2016-11-01 04:13:54 +03:00
the [GloVe ](http://nlp.stanford.edu/projects/glove/ ) vectors, perform the
feature extraction, and help you apply the model at run-time. The following
demo code shows how the entailment model can be used at runtime, once the
hook is installed to customise the `.similarity()` method of spaCy's `Doc`
and `Span` objects:
```python
def demo(model_dir):
nlp = spacy.load('en', path=model_dir,
create_pipeline=create_similarity_pipeline)
doc1 = nlp(u'Worst fries ever! Greasy and horrible...')
doc2 = nlp(u'The milkshakes are good. The fries are bad.')
print(doc1.similarity(doc2))
sent1a, sent1b = doc1.sents
print(sent1a.similarity(sent1b))
print(sent1a.similarity(doc2))
print(sent1b.similarity(doc2))
```
2016-11-01 03:51:54 +03:00
I'm working on a blog post to explain Parikh et al.'s model in more detail.
I think it is a very interesting example of the attention mechanism, which
2016-11-01 05:05:17 +03:00
I didn't understand very well before working through this paper. There are
lots of ways to extend the model.
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
## What's where
2016-11-01 03:51:54 +03:00
2016-11-01 05:19:15 +03:00
| File | Description |
| --- | --- |
| `__main__.py` | The script that will be executed. Defines the CLI, the data reading, etc — all the boring stuff. |
| `spacy_hook.py` | Provides a class `SimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)` , you'll get the result of `your_model(doc1, doc2)` . |
| `keras_decomposable_attention.py` | Defines the neural network model. |
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
## Setting up
2016-11-01 03:51:54 +03:00
2016-11-01 05:19:15 +03:00
First, install [Keras ](https://keras.io/ ), [spaCy ](https://spacy.io ) and the spaCy
English models (about 1GB of data):
2016-11-01 03:51:54 +03:00
2016-11-01 04:13:54 +03:00
```bash
2016-11-01 05:05:17 +03:00
pip install keras spacy
python -m spacy.en.download
2016-11-01 04:13:54 +03:00
```
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
You'll also want to get keras working on your GPU. This will depend on your
2016-11-01 05:19:15 +03:00
set up, so you're mostly on your own for this step. If you're using AWS, try the
[NVidia AMI ](https://aws.amazon.com/marketplace/pp/B00FYCDDTE ). It made things pretty easy.
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
Once you've installed the dependencies, you can run a small preliminary test of
2016-11-01 05:19:15 +03:00
the Keras model:
2016-11-01 03:51:54 +03:00
2016-11-01 04:13:54 +03:00
```bash
py.test keras_parikh_entailment/keras_decomposable_attention.py
```
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
This compiles the model and fits it with some dummy data. You should see that
both tests passed.
2016-11-01 03:51:54 +03:00
2016-11-01 05:19:15 +03:00
Finally, download the [Stanford Natural Language Inference corpus ](http://nlp.stanford.edu/projects/snli/ ).
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
## Running the example
You can run the `keras_parikh_entailment/` directory as a script, which executes the file
2016-11-01 05:19:15 +03:00
[`keras_parikh_entailment/__main__.py` ](__main__.py ). The first thing you'll want to do is train the model:
2016-11-01 03:51:54 +03:00
2016-11-01 04:13:54 +03:00
```bash
python keras_parikh_entailment/ train < your_model_dir > < train_directory > < dev_directory >
```
2016-11-01 03:51:54 +03:00
Training takes about 300 epochs for full accuracy, and I haven't rerun the full
2016-11-01 04:13:54 +03:00
experiment since refactoring things to publish this example — please let me
2016-11-01 05:05:17 +03:00
know if I've broken something. You should get to at least 85% on the development data.
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
The other two modes demonstrate run-time usage. I never like relying on the accuracy printed
by `.fit()` methods. I never really feel confident until I've run a new process that loads
the model and starts making predictions, without access to the gold labels. I've therefore
included an `evaluate` mode. Finally, there's also a little demo, which mostly exists to show
you how run-time usage will eventually look.
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
## Getting updates
2016-11-01 03:51:54 +03:00
2016-11-01 05:05:17 +03:00
We should have the blog post explaining the model ready before the end of the week. To get
2016-11-01 05:19:15 +03:00
notified when it's published, you can either the follow me on [Twitter ](https://twitter.com/honnibal ),
or subscribe to our [mailing list ](http://eepurl.com/ckUpQ5 ).