Forecast of Natural Gas Price with Deep Learning

10 minute read

Hello everyone today I am going to predict the Natural Gas Price using Time Series with Long short-term memory (LSTM) Neural Networks. This project can be used to predict any times series data. In this blog we are going to give you a Real Prediction of the Gas Price of the next week.

The financial data that we will use is investpy that is one of the most complete Python packages when it comes to financial data extraction to stop relying on public/private APIs since investpy is free and has no limitations. Is one good of financial data retrieval.

Step 1. Creation of the environment

Installation of Conda

First you need to install anaconda at this link

img

I will create an environment called forecast, but you can put the name that you like.

conda create -n forecast python==3.7

If you are running anaconda for first time, you should init conda with the shell that you want to work, in this case I choose the cmd.exe

conda init cmd.exe

and then close and open the terminal

conda activate forecast

then in your terminal type the following commands:

conda install ipykernel
python -m ipykernel install --user --name forecast --display-name "Python (Forecast)"

Then we install Tensorflow

pip install tensorflow==2.9.0

and Keras

pip install keras==2.9.0

If you will work with Forecasting projects I suggest install additional libraries:

pip install  statsmodels pandas matplotlib  sklearn  plotly  nbformat seaborn 

We need to install investpy which allows the user to download both recent and historical data from all the financial products indexed at Investing.com.

we type

pip install investpy 

Step 2 Download the Recent/Historical Data Retrieval

The following notebook details the process of training a neural network with LSTM models.

import numpy as np
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf 
from statsmodels.tsa.seasonal import seasonal_decompose 
import matplotlib.pyplot as plt                       
from sklearn.metrics import mean_squared_error
from statsmodels.tools.eval_measures import rmse
import warnings
warnings.filterwarnings("ignore")
import time
from datetime import date
from datetime import datetime
today = date.today()
print("Today's date:", today)
to_today = datetime.strptime(str(today), '%Y-%m-%d').strftime('%d/%m/%Y')
to_date = datetime.strptime(str(today), '%Y-%m-%d').strftime('%Y-%m-%d')
Today's date: 2022-07-11
def addonDays(a, x):
   ret = time.strftime("%Y-%m-%d",time.localtime(time.mktime(time.strptime(a,"%Y-%m-%d"))+x*3600*24+3600))      
   return ret
week_ago=addonDays(to_date, -7)
week_ago
'2022-07-04'

Searching the Financial Data

The search function allows the user to tune the parameters to adjust the search results to their needs, where both product types and countries from where the products are, can be specified.

For this project we are looking for the price of Natural Gas but also can be used to predict prices of Crude Oil, Gold, Silver, Copper, etc.

In the text we write what we want to predict, in this project we use Natural Gas.

import investpy

gas_result = investpy.search_quotes(text='Natural Gas', products=['stocks'],
                                       countries=['united states'], n_results=1)
print(gas_result)
{"id_": 20413, "name": "Northwest Natural Gas Co", "symbol": "NWN", "country": "united states", "tag": "/equities/northwest-natural-gas-comp", "pair_type": "stocks", "exchange": "NYSE"}

Retrieved search results will be a list. those search results let the user retrieve both recent and historical data,

recent_data = gas_result.retrieve_recent_data()
print(recent_data.head())
             Open   High    Low  Close   Volume  Change Pct
Date                                                       
2022-06-13  53.31  53.56  51.28  51.51   214539       -4.77
2022-06-14  51.19  51.70  50.10  50.81   257471       -1.36
2022-06-15  50.94  51.63  50.51  50.81   337460        0.00
2022-06-16  50.28  51.01  49.69  50.86   428673        0.10
2022-06-17  51.43  51.99  50.56  51.69  1419987        1.63

Its information, the technical indicators, the default currency, etc., as presented in the piece of code below:

historical_data = gas_result.retrieve_historical_data(from_date='01/01/2022', to_date=to_today)
print(historical_data.head())
             Open   High    Low  Close  Volume  Change Pct
Date                                                      
2022-01-03  48.90  49.37  48.21  48.77  181573       -0.02
2022-01-04  49.10  49.79  49.08  49.35  131969        1.19
2022-01-05  49.47  49.89  49.24  49.57  126793        0.45
2022-01-06  49.72  49.74  49.27  49.54  130718       -0.06
2022-01-07  49.55  49.98  49.17  49.80  131915        0.52
information = gas_result.retrieve_information()
print(information)
{'prevClose': 52.06, 'dailyRange': '51.8-52.79', 'revenue': 894760000, 'open': 52.26, 'weekRange': '43.07-57.63', 'eps': 2.42, 'volume': 156807, 'marketCap': 1790000000, 'dividend': '1.93(3.71%)', 'avgVolume': 282206, 'ratio': 21.53, 'beta': 0.45, 'oneYearReturn': '0.12%', 'sharesOutstanding': 34255926, 'nextEarningDate': '11/08/2022'}
default_currency = gas_result.retrieve_currency()
print(default_currency)
USD
technical_indicators = gas_result.retrieve_technical_indicators(interval='daily')
print(technical_indicators)
              indicator           signal    value
0               RSI(14)          neutral  48.9150
1            STOCH(9,6)          neutral  45.0020
2          STOCHRSI(14)         oversold  19.5070
3           MACD(12,26)              buy   0.2100
4               ADX(14)          neutral  11.1960
5           Williams %R             sell -60.4200
6               CCI(14)          neutral -39.8620
7               ATR(14)  less_volatility   1.3331
8        Highs/Lows(14)          neutral   0.0000
9   Ultimate Oscillator              buy  51.6290
10                  ROC              buy   1.4510
11  Bull/Bear Power(13)             sell  -0.9120
import investpy

hist = investpy.get_stock_historical_data(stock='NWN',
                                        country='United States',
                                        from_date='01/01/2022',
                                        to_date=to_today)
print(hist.head())
             Open   High    Low  Close  Volume Currency
Date                                                   
2022-01-03  48.90  49.37  48.21  48.77  181573      USD
2022-01-04  49.10  49.79  49.08  49.35  131969      USD
2022-01-05  49.47  49.89  49.24  49.57  126793      USD
2022-01-06  49.72  49.74  49.27  49.54  130718      USD
2022-01-07  49.55  49.98  49.17  49.80  131915      USD
import plotly.graph_objects as go
fig = go.Figure(data=go.Scatter(x=hist.index,y=hist['Close'], mode='lines'))
fig.update_layout(title={'text':gas_result.name+' ('+ default_currency +')', 'x':0.5})
fig.show()

Drawing an indicator 20 Day Moving Average, pandas provides convenient ways to calculate time series-related metrics such as the moving average. The df.rolling() method provides “moving windows” that we can operate on. To get the average of the moving window, we just need to add the .mean() at the end of the rolling() method.

from plotly.subplots import make_subplots
fig2 = make_subplots(specs=[[{"secondary_y": True}]])
fig2.add_trace(go.Candlestick(x=hist.index,
                              open=hist['Open'],
                              high=hist['High'],
                              low=hist['Low'],
                              close=hist['Close'],
                             ))
fig2.add_trace(go.Scatter(x=hist.index,y=hist['Close'].rolling(window=20).mean(),marker_color='blue',name='20 Day MA'))
fig2.add_trace(go.Bar(x=hist.index, y=hist['Volume'], name='Volume'),secondary_y=True)
fig2.update_layout(title={'text':gas_result.name+' ('+ default_currency +')', 'x':0.5})
fig2.update_yaxes(range=[0,1000000000],secondary_y=True)
fig2.update_yaxes(visible=False, secondary_y=True)
fig2.update_layout(xaxis_rangeslider_visible=False)  #hide range slider
fig2.show()

Step 3 Creation of Dataframe

hist['Date'] = hist.index
df = hist[["Date", "Close"]]
df=df.reset_index(drop=True)
df.head()
Date Close
0 2022-01-03 48.77
1 2022-01-04 49.35
2 2022-01-05 49.57
3 2022-01-06 49.54
4 2022-01-07 49.80
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    130 non-null    datetime64[ns]
 1   Close   130 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 2.2 KB
df.Date = pd.to_datetime(df.Date)
df = df.set_index("Date")
df.head()
Close
Date
2022-01-03 48.77
2022-01-04 49.35
2022-01-05 49.57
2022-01-06 49.54
2022-01-07 49.80
ax = df['Close'].plot(figsize = (16,5), title = gas_result.name+' ('+ default_currency +')')
ax.set(xlabel='Dates', ylabel=' ('+ default_currency +')');

This is the plot of the prices of the Natural Gas png

Step 4 LSTM model training

Pre-processing data with MinMaxScaler

from sklearn.model_selection import train_test_split

train_data, test_data = train_test_split(df, test_size=0.2)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(train_data)
scaled_train_data = scaler.transform(train_data)
scaled_test_data = scaler.transform(test_data)

Before creating the LSTM model, we need to create a Time Series Generator object.

from keras.preprocessing.sequence import TimeseriesGenerator
# Let's redefine to get 24 days  back and then predict the next day out
n_input = 24
n_features= 1
generator = TimeseriesGenerator(scaled_train_data, scaled_train_data, 
                                length=n_input, batch_size=1)

Example of transformation

X,y = generator[0]
print(f'Given the Array: \n{X.flatten()}')
print(f'Predict this y: \n {y}')
Given the Array: 
[0.31456311 0.52427184 0.36116505 0.25631068 0.2368932  0.79126214
 0.82427184 0.1368932  0.91067961 0.3592233  0.81941748 0.82330097
 0.52524272 0.6368932  1.         0.11747573 0.12135922 0.61553398
 0.70291262 0.14660194 0.54563107 0.20873786 0.90194175 0.08446602]
Predict this y: 
 [[0.39223301]]

Long short-term memory (LSTM)

Long short-term memory is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network can process not only single data points, but also entire sequences of data.

We are going to present two models of LSTM, one with a Single LSTM contribution and with Double LSTM contribution.

Step 6 - Single LSTM Model

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
lstm_model = Sequential()
lstm_model.add(LSTM(50, activation='relu', input_shape=(n_input, n_features)))
lstm_model.add(Dense(1))
lstm_model.compile(optimizer='adam', loss='mse')
lstm_model.summary()
WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (None, 50)                10400     
                                                                 
 dense (Dense)               (None, 1)                 51        
                                                                 
=================================================================
Total params: 10,451
Trainable params: 10,451
Non-trainable params: 0
_________________________________________________________________
lstm_model.fit_generator(generator,epochs=20)
Epoch 1/20
80/80 [==============================] - 9s 97ms/step - loss: 0.1017
Epoch 2/20
.
.
.
80/80 [==============================] - 8s 96ms/step - loss: 0.0722
Epoch 20/20
80/80 [==============================] - 9s 107ms/step - loss: 0.0717

<keras.callbacks.History at 0x1bf63d0e518>
losses_lstm = lstm_model.history.history['loss']
plt.figure(figsize=(12,4))
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.xticks(np.arange(0,21,1))
plt.plot(range(len(losses_lstm)),losses_lstm);

png

Step 7 -Prediction on test data

Next we are going to make a prediction for 12 days (12 predictions). To do this we will do the following:

  • create an empty list for each of our 12 predictions
  • create the batch that our model will predict
  • save the prediction in our list
  • add the prediction to the end of the batch to use it in the next prediction
lstm_predictions_scaled = list()

batch = scaled_train_data[-n_input:]
current_batch = batch.reshape((1, n_input, n_features))

for i in range(len(test_data)):   
    lstm_pred = lstm_model.predict(current_batch)[0]
    lstm_predictions_scaled.append(lstm_pred) 
    current_batch = np.append(current_batch[:,1:,:],[[lstm_pred]],axis=1)
1/1 [==============================] - 0s 164ms/step
.
.
.
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 25ms/step

As you know, we scale our data, so we have to invert it to see true predictions.

lstm_predictions_scaled
[array([0.41446185], dtype=float32),
 array([0.4108159], dtype=float32),
.
.
.
 array([0.40563506], dtype=float32)]
lstm_predictions = scaler.inverse_transform(lstm_predictions_scaled)
lstm_predictions
array([[50.36895707],
       [50.33140371],
.
.
.
       [50.27945744],
       [50.27830387],
       [50.27804111]])
test_data['LSTM_Predictions'] = lstm_predictions
test_data['Close'].plot(figsize = (16,5), legend=True)
test_data['LSTM_Predictions'].plot(legend = True);

png

lstm_rmse_error = rmse(test_data['Close'], test_data["LSTM_Predictions"])
lstm_mse_error = lstm_rmse_error**2
mean_value = df['Close'].mean()

print(f'MSE Error: {lstm_mse_error}\nRMSE Error: {lstm_rmse_error}\nMean: {mean_value}')
MSE Error: 7.840908820950428
RMSE Error: 2.8001622847525156
Mean: 50.96569230769231

Step 6 -Double LSTM Model

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.layers import Dropout

lstm_model = Sequential()
lstm_model.add(LSTM(200, activation='relu',return_sequences=True,
                    input_shape=(n_input, n_features)))
lstm_model.add(LSTM(200, return_sequences=True))
lstm_model.add(Dropout(rate=0.2))
lstm_model.add(LSTM(200, return_sequences=False))
lstm_model.add(Dense(1))

mc = ModelCheckpoint('double_model_lstm.h5', monitor='val_loss', mode='min', 
                     verbose=1, save_best_only=True)

lstm_model.summary()
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_1 (LSTM)               (None, 24, 200)           161600    
                                                                 
 lstm_2 (LSTM)               (None, 24, 200)           320800    
                                                                 
 dropout (Dropout)           (None, 24, 200)           0         
                                                                 
 lstm_3 (LSTM)               (None, 200)               320800    
                                                                 
 dense_1 (Dense)             (None, 1)                 201       
                                                                 
=================================================================
Total params: 803,401
Trainable params: 803,401
Non-trainable params: 0
_________________________________________________________________
lstm_model.compile(optimizer='adam', loss='mse')
lstm_model.fit_generator(generator,epochs=20)
Epoch 1/20
80/80 [==============================] - 14s 116ms/step - loss: 0.1063
Epoch 2/20
80/80 [==============================] - 10s 122ms/step - loss: 0.0803
Epoch 3/20
.
.
.
Epoch 19/20
80/80 [==============================] - 11s 133ms/step - loss: 0.0735
Epoch 20/20
80/80 [==============================] - 11s 139ms/step - loss: 0.0744

<keras.callbacks.History at 0x1bf6a9dae48>

Step 7 -EarlyStopping y Validation Generator

from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss',
                           patience=12)
validation_generator = TimeseriesGenerator(scaled_test_data,scaled_test_data, 
                                           length=n_input)
lstm_model.compile(optimizer='adam', 
              loss='mse')
# fit model
lstm_model.fit_generator(generator,epochs=20,
                    validation_data=validation_generator,
                   callbacks=[early_stop, mc])
Epoch 1/20
80/80 [==============================] - ETA: 0s - loss: 0.0759
Epoch 1: val_loss improved from inf to 0.08796, saving model to double_model_lstm.h5
80/80 [==============================] - 15s 145ms/step - loss: 0.0759 - val_loss: 0.0880
Epoch 2/20
80/80 [==============================] - ETA: 0s - loss: 0.0773
.
.
.
Epoch 15/20
80/80 [==============================] - ETA: 0s - loss: 0.0694
Epoch 15: val_loss did not improve from 0.04324
80/80 [==============================] - 9s 117ms/step - loss: 0.0694 - val_loss: 0.2474

<keras.callbacks.History at 0x1bf6a830160>

Learning curve

losses = pd.DataFrame(lstm_model.history.history)
losses.plot()
<AxesSubplot:>

png

from keras.models import load_model

lstm_model = load_model('double_model_lstm.h5', compile=False)
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.

Step 8 -Prediction on test data

lstm_predictions_scaled = list()

batch = scaled_train_data[-n_input:]
current_batch = batch.reshape((1, n_input, n_features))

for i in range(len(test_data)):   
    # get prediction 1 time stamp ahead ([0] is for grabbing just the number instead of [array])
    lstm_pred = lstm_model.predict(current_batch)[0]
    
    # store prediction
    lstm_predictions_scaled.append(lstm_pred) 
    
    # update batch to now include prediction and drop first value
    current_batch = np.append(current_batch[:,1:,:],[[lstm_pred]],axis=1)
1/1 [==============================] - 1s 684ms/step
1/1 [==============================] - 0s 27ms/step
.
.
.
1/1 [==============================] - 0s 30ms/step
1/1 [==============================] - 0s 25ms/step

As you know, we scale our data, so we have to invert it to see true predictions.

Reverse the transformation

lstm_predictions = scaler.inverse_transform(lstm_predictions_scaled)
test_data['LSTM_Predictions'] = lstm_predictions
test_data['Close'].plot(figsize = (16,5), legend=True)
test_data['LSTM_Predictions'].plot(legend = True);

png

lstm_rmse_error = rmse(test_data['Close'], test_data["LSTM_Predictions"])
lstm_mse_error = lstm_rmse_error**2
mean_value = df['Close'].mean()

print(f'MSE Error: {lstm_mse_error}\nRMSE Error: {lstm_rmse_error}\nMean: {mean_value}')
MSE Error: 6.697523292919587
RMSE Error: 2.587957359177231
Mean: 50.96569230769231

Step 9 - Real Prediction of the Gas Price of the next week

Let us assume that we want to predict the price of the gas of the next week.

df.head()
Close
Date
2022-01-03 48.77
2022-01-04 49.35
2022-01-05 49.57
2022-01-06 49.54
2022-01-07 49.80
full_scaler = MinMaxScaler()
scaled_full_data = full_scaler.fit_transform(df)
length = 7 # Length of the output sequences (in number of timesteps)
generator = TimeseriesGenerator(scaled_full_data, scaled_full_data, length=length, batch_size=1)
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(length, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')


# fit model
model.fit_generator(generator,epochs=8)
WARNING:tensorflow:Layer lstm_4 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
Epoch 1/8
123/123 [==============================] - 6s 43ms/step - loss: 0.0634
.
.
.
Epoch 8/8
123/123 [==============================] - 5s 39ms/step - loss: 0.0135

<keras.callbacks.History at 0x1c13b4f9b38>
forecast = []
# Replace periods with whatever forecast length you want
#one week
periods = 7

first_eval_batch = scaled_full_data[-length:]
current_batch = first_eval_batch.reshape((1, length, n_features))

for i in range(periods):
    
    # get prediction 1 time stamp ahead ([0] is for grabbing just the number instead of [array])
    current_pred = model.predict(current_batch)[0]
    
    # store prediction
    forecast.append(current_pred) 
    
    # update batch to now include prediction and drop first value
    current_batch = np.append(current_batch[:,1:,:],[[current_pred]],axis=1)
1/1 [==============================] - 0s 140ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 23ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 18ms/step
forecast = scaler.inverse_transform(forecast)
test_data.tail()
Close LSTM_Predictions
Date
2022-01-21 47.54 51.572930
2022-04-21 51.55 51.572321
2022-06-08 54.70 51.570076
2022-03-14 54.56 51.570197
2022-05-18 51.07 51.569831
week_ago=addonDays(to_date, -7)
next_week=addonDays(to_date, 7)
next_month=addonDays(to_date, 30)
forecast_index = pd.date_range(start=to_date,periods=periods,freq='D')
forecast_index
DatetimeIndex(['2022-07-11', '2022-07-12', '2022-07-13', '2022-07-14',
               '2022-07-15', '2022-07-16', '2022-07-17'],
              dtype='datetime64[ns]', freq='D')
forecast_df = pd.DataFrame(data=forecast,index=forecast_index,
                           columns=['Forecast'])
forecast_df
Forecast
2022-07-11 52.077826
2022-07-12 51.977773
2022-07-13 51.993099
2022-07-14 51.897952
2022-07-15 51.862930
2022-07-16 51.860414
2022-07-17 51.833954
ax = df.plot()
forecast_df.plot(ax=ax)
<AxesSubplot:xlabel='Date'>

png

month_ago=addonDays(to_date, -30)
ax = df.plot()
forecast_df.plot(ax=ax)
plt.xlim(month_ago,next_week)
(19154.0, 19191.0)

png

fig3 = make_subplots(specs=[[{"secondary_y": True}]])
fig3.add_trace(go.Scatter(x=forecast_df.index,y=forecast_df['Forecast'],name='Forecast'))
fig3.add_trace(go.Scatter(x=df.index,y=df['Close'],name='Close'))
fig3.update_layout(title={'text':gas_result.name+' ('+ default_currency +')', 'x':0.5})
fig3.update_yaxes(range=[0,1000000000],secondary_y=True)
fig3.update_yaxes(visible=False, secondary_y=True)
fig3.update_layout(xaxis_rangeslider_visible=False)  #hide range slider
fig3.show()

You can download the notebook here or you can run it on Google Colab Foo.

Congratulations! We have created a Neural Network by using LSTM and we have predict the Natural Gas for one week.

Posted:

Leave a comment