This post creates a plain vanilla LSTM and learns a simple pattern in the sequence.

The Plain Vanilla LSTM takes in a sequence [x1, x2, x3, ... x50] and then tries to learn that the output of the sequence is the second element of the sequence x2.

Data Preparation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import numpy as np
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import LSTM, Dense
import matplotlib.pyplot as plt
n_max     = 5
n_steps   = 10
n_samples = 1000
np.random.seed(1234)
X         = np.random.randint(0,n_max-1,n_samples*n_steps).reshape(n_samples, n_steps, 1)
X_cat     = to_categorical(X, num_classes = n_max)
Y         = X_cat[:,1,:]

The following generates a 10 step sequence with digits between 0 and 5. The input and output are converted to one hot vectors

Model Building

1
2
3
4
5
model = Sequential()
model.add(LSTM(25, input_shape= (n_steps, n_max)))
model.add(Dense(n_max, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['acc'])
model.summary()
1
2
3
4
5
6
7
8
9
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 25)                3100      
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 130       
=================================================================
Total params: 3,230
Trainable params: 3,230
Non-trainable params: 0

The number of parameters in layer 1 is 4*((5+1)*25+25*25) Then numbers of of parameters in layer 2 is (25 +1)*5 = 130

Model Training

1
2
3
4
5
6
7
8
n_epochs = 250
history =model.fit(X_cat,Y, validation_split=0.2, epochs=n_epochs)
# Generate training data 
X         = np.random.randint(0,n_max-1,n_samples*n_steps).reshape(n_samples, n_steps, 1)
X_cat     = to_categorical(X, num_classes = n_max)
Y         = X_cat[:,1,:]
print("Accuracy :", model.evaluate(X_cat,Y))
print(" The # of examples correctly classified are ", np.sum(np.argmax(model.predict(X_cat),axis=1) == np.argmax(Y,axis=1)))

Takeaways

  • Input is a sequence of steps
  • Model is a classification model
  • 5 attractive properties of Vanilla LSTM
    • Sequence classification conditional on multiple distributed input time steps.
    • Memory of precise input observations over thousands of time steps.
    • Sequence prediction as a function of prior time steps.
    • Robust to the insertion of random time steps on the input sequence.
    • Robust to the placement of signal data on the input sequence.