4.4 Recurrent Neural Networks: Processing sequences#

They are used for time series predictions. Regular dense networks can also do it, and CNNs can also work for very long time series. A recurrent neuron receives an input and the output from the neuron at the previous time step. Because each neuron learns from the previous time step, it has memory; but these simple cells have relatively short memory (10 cells about). RNNs take in a sequence and output a sequence.

RNN

From D2DL: example of an RNN with a hidden state. The RNN takes the multiplication of the weights and the data plus other weights with the hidden states to the next layer.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow import keras

Synthetic data

Let’s create a synthetic time series

def generate_time_series(batch_size,n_steps):
    f1,f2,off1,off2=np.random.rand(4,batch_size,1)
    t = np.linspace(0,1,n_steps)
    y = 0.5*np.sin( (t-off1)*(f1*10+10) ) # first wave
    y += 0.5*np.sin( (t-off2)*(f2*20+20) ) # second wave
    y += 0.3* (np.random.rand(batch_size,n_steps)-0.5) # noise
    return y[...,np.newaxis].astype(np.float32) 
# here we added a dimension to the output time series because most ML algorithms can be multidimensional, but here we are just doing a single time series.

# we generate 10k time series of 51 points.
n_steps=50
y = generate_time_series(10000,n_steps+1)
plt.plot(y[5000,:]);plt.grid(True)

train-validation-test split

In forecasting problem, we do not want to shuffle the training and test set since we want to make sure test is a prediction from the past (training).

The training data are time series of 50 points, the “label” or “model output” is the last value of the time series.

x_train,y_train = y[:7000,:n_steps],y[:7000,-1] 
x_val,y_val = y[7000:9000,:n_steps],y[7000:9000,-1] 
x_test,y_test = y[9000:,:n_steps],y[9000:,-1] 
plt.plot(np.arange(n_steps),x_train[5000,:],'b')
plt.plot(51,y_train[5000],'r+')
plt.grid(True)

We can use the past values to predict the state, called naive forecasting:

y_pred=x_val[:,-1]

Or we can use a fully connected network and predict the value as a MLP regression:

model.keras.models.Sequential([keras.layers.Flatten(input_shape=[50,1]),
                              keras.layers.Dense(1)])

They don’t do too bad in this problem. The simple block is a simpleRNN. The simplest recurrent neuron is a SimpleRNN(1,input_shape=[None,1]) that takes any input scalar since it can process any number of time steps. The default activation function is tanh. To return a time series, and not its final output, you need to set return_sequences=True. It turns out that the simplest and single recurrent neuron won’t work. So we stack several simpleRNNs.

model=keras.models.Sequential([
    keras.layers.SimpleRNN(20,input_shape=[None,1],return_sequences=True),
    keras.layers.SimpleRNN(20,return_sequences=True),
    keras.layers.SimpleRNN(1)
    ])
model.summary()
model.compile(optimizer='adam',loss='mse',metrics=['mse'])
history=model.fit(x_train,y_train,validation_data=(x_val,y_val), epochs=20, batch_size=128) 
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(True)
plt.xlabel('epochs')
model=keras.models.Sequential([
    keras.layers.SimpleRNN(20,input_shape=[None,1],return_sequences=True),
    keras.layers.SimpleRNN(20),
    keras.layers.Dense(1)
    ])
model.summary()
model.compile(optimizer='adam',loss='mse',metrics=['mse'])
history=model.fit(x_train,y_train,validation_data=(x_val,y_val), epochs=20, batch_size=128) 
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(True)
plt.xlabel('epochs')

Forecast of several steps ahead: how far can you predict the future?#

We will try and predict 10 steps ahead. The early part of the forecast will be a lot better than the later part of the forecast as uncertainties increase.

# we generate 10k time series of 51 points.
n_steps=50
x = generate_time_series(10000,n_steps+10)
y=np.empty((10000,n_steps,10))
for step_ahead in range(1,10+1):
    y[:,:,step_ahead-1]=x[:,step_ahead:step_ahead+n_steps,0]

    
x_train=x[:7000,:n_steps]
x_val=x[7000:9000,:n_steps]
x_test=x[9000:,:n_steps]

y_train=y[:7000]
y_val=y[7000:9000]
y_test=y[9000:]
model=keras.models.Sequential([
    keras.layers.SimpleRNN(20,input_shape=[None,1],return_sequences=True),
    keras.layers.SimpleRNN(20,return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
    ])
model.summary()
model.compile(optimizer='adam',loss='mse',metrics=['mse'])
history=model.fit(x_train,y_train,validation_data=(x_val,y_val), epochs=20, batch_size=128) 
y_pred=model.predict(x_test)
print(y_pred.shape)
print(x_test.shape)
print(y_test.shape)
plt.plot(np.arange(n_steps+10),x[9000,:])
plt.plot(np.arange(n_steps),x_test[0,:])
plt.plot(np.arange(10)+n_steps,y_pred[0,-1,:],'+')
plt.legend(('Truth','past','future'))
plt.grid(True)

Problems with RNNs and solutions#

Simple RNNs have issues during the training with backpropagation that gradients may become too small and that the model no longer updates during training. This is called the vanishing gradient problem.

To remedy this, the algorithm “LSTM” introduces a memory-cell and gating to allow and reset the values and avoid vanishing gradients.

2. LSTM#

Long-Short Term Memory are (somewhat complicated) cells that aims to solve the memory loss issue.

LSTM An LSTM combines hidden state from the previous layers, the memory of the internal state, and the input data to output the current hidden and internal states.

model=keras.models.Sequential([
    keras.layers.LSTM(20,input_shape=[None,1],return_sequences=True),
    keras.layers.LSTM(20,return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
    ])
model.summary()
model.compile(optimizer='adam',loss='mse',metrics=['mse'])
history=model.fit(x_train,y_train,validation_data=(x_val,y_val), epochs=20, batch_size=128) 
y_pred=model.predict(x_test)
plt.plot(np.arange(n_steps+10),x[9000,:])
plt.plot(np.arange(n_steps),x_test[0,:])
plt.plot(np.arange(10)+n_steps,y_pred[0,-1,:],'+')
plt.legend(('Truth','past','future'))
plt.grid(True)