I was able to undersand the basic idea behind RNN after working through an example from the book by Antonio Gulli.

Data Download

The dataset comprises all the characters in the text available at Alice-Gutenberg. The basic idea of this exercise is to use SimpleRNN to create 10 character sequences as input sequences that will be put through SimpleRNN to generate the next character in the sequence

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import urllib
url  = "http://www.gutenberg.org/files/11/11-0.txt"
data   = urllib.request.urlopen(url)
lines  = []
for line in data:
    line = line.strip().lower()
    line = line.decode('ascii','ignore')
    if len(line) >0 : lines.append(line)

text = " ".join(lines)

Preprocessing Data - Building a word look up

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
chars = set(c for c in text)
nb_chars = len(chars)

char2idx = {c:i for i, c in enumerate(chars)} idx2char = {i:c for i, c in enumerate(chars)}

SEQLEN = 10 STEP = 1 input_chars = [] label_chars = [] for i in range(0, len(text)-SEQLEN, STEP): input_chars.append(text[i:i+SEQLEN]) label_chars.append(text[i+SEQLEN])

Preparing training and test data

1
2
3
4
5
6
X = np.zeros((len(input_chars), SEQLEN, nb_chars))
Y = np.zeros((len(input_chars), nb_chars))
for i,seq in enumerate(input_chars):
    for j,c in enumerate(seq):
        X[i,j,char2idx[c]] = 1
    Y[i,char2idx[label_chars[i]]] = 1

Defining Simple RNN layer

1
2
3
4
5
6
7
model = Sequential()
model.add(SimpleRNN(HIDDEN_SIZE, return_sequences=False,
                    input_shape= (SEQLEN, nb_chars),
                    unroll=True))

model.add(Dense(nb_chars, activation="softmax")) model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

Train the Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def get_test_input(word):
    X = np.zeros((1,SEQLEN, nb_chars))
    for i,c in enumerate(word):
        X[0,i,char2idx[c]] = 1
    return(X)

NUM_ITERATIONS=25 for iteration in range(NUM_ITERATIONS): test_idx = np.random.randint(len(input_chars)) test_chars = input_chars[test_idx] model.fit(X,Y, batch_size = BATCH_SIZE, epochs= NUM_EPOCHS_PER_ITERATION, verbose=False) print("="*50) print(f"Iteration : {iteration}") test_idx = np.random.randint(len(input_chars)) test_chars = input_chars[test_idx] gen_chars = test_chars for i in range(NUM_PREDS_PER_EPOCH ): X_test = get_test_input(test_chars) output_char = idx2char[np.argmax(model.predict(X_test))] gen_chars += output_char test_chars = test_chars[1:]+ output_char print(gen_chars)

Even though the model code looks simple, there is a lot that is going on behind it.