Introduction to LSTM

By: Guillaume De Gani

This notebook is an introduction to the usage of LSTM and CNN on time series to make predictions. This technology has many applications. For example, the prediction of the stock market or, in this case, the consumption of a household.

Before we start working on a larger dataset it's iportant to have a basic understanding of how LSTM and Time Series work. To do so we use a fairly simple dataset based on the ammount of passengers for an airplane company.

Here we plot the number of passengers over time to have an idea of what our data will look like.

Importing Librairies

Preparing the data

Before we can start making our prediction and training our model it is important to prepare the data to do we normalize it and remove ne Nan's to avoir any errors. After doing so we devide it into a training set and a validation set to check how accurate our prediction was.

Here we create a simple model just to have an idea of the diffrent parameters that are involved in LSTM's.

Bellow we create a function that creates a LSTM network with a specific value for the lookback

We create a list of models with various different values for the look back. By doing this we can later on iterate through the different models and compare them to chose the best value for the lookup.

The graphs above are the results for the diffrent lookback values two things are noticable. First of all a higher lookback values isn't necesarlly corelated with an improvement in the prediction. Furthermore since the models looks further back a break appears in the prediction before the model can start make a which can be problematic.

Layers Comparaison

In this section we create a list of models with a various amount of LSTM layers to compare them

Increasing the ammount of LSTM layers makes the model more focused on the "pass" the consequence of this is that the models doesn't properly predict an increase in the ammount of passenger the more the time passes. This can be observed in the last graph where we use 4 LSTM layers and the model always underestimates the value compared to ground truth.

The results are fairly similar to the one we got above however this structure will be usefulll when we try to train the CNN in the following part of the project.

To complete the Nan's we get the value at the same time but the day before. Note I tried averaging the two closes value however it's fairly common that the Nan's follow eachother.

Preparing the data

Before testing our models I decided to reduce the size of the data to do so it is possible to merge the data day by day by summing the values of each day. This reduces the size of the data by a large amount without losing to much information regarding the evolution of electricity consumption over the years or months.

Bellow we create a list of models with a varying amount of layers and lookback value. A notable thing here is the larger lookback values since it's possible that looking what happend the day before could be very helpful to make a prediction.

To evaluate the diffrent models and there efficiency at predicting the future we iterate threw the models and select the one with the smallest RMSE. This indecates that it was the best model from the list of models that was given. It's important to understand that this method is fairly time consumming however finding the right strucutre for our data is usually donne by trial and error.

The results are interesting since the diffrent dataset we study require a diffrent LSTM structure. Furthermore the results are fairly promisssing since they are fairly similar to the ground truth a seen in the test data prediction. The third graph is interesting since the model can fairly accuratly predict the drops in intensity.

1D-CNN

The idea here is to use a window that will be compressed to a lower dimmension using a 1-D convolution and at the end the model will try to give a prediction. The prediction is is done on the n_steps previous points if available.

The results show that the prediction is not very accurate on a per prediction basis. However the model is usually able to predict the general tendancy of the data that we are studying. Note it would have been possible to use more features to make the prediction for example watch the the Global_active_power and the Intensity to predict the next value. The LSTM seems to be better since the prediction is closer to the ground truth.