Forecasting Model with Exponential Smoothing

Abraham LunaAbraham Luna
5 min read

This is the first article of a series to present different forecasting models. We will start by using the Exponential Smoothing model provided by statsmodels library.

The dataset is the Alphabet Inc. daily stock prices that can be downloaded from Yahoo Finance. Data ranged from Jan-2020 to Dec-2023.

The target column is Close column. It represents the stock price at the end of the day.

Here are the steps we will follow in each article:

  1. Data Preparation

  2. Decomposing The Data Set

  3. Training The Model

  4. Evaluate The Model

Exponential Smoothing

Exponential Smoothing is a time series forecast method for univariate data. It means that can make forecast predictions for just one variable related to the time series.

One of the advantages of using the Exponential Smoothing is that uses Holt’s Linear Trend Model to learn the trend and seasonality of the data. This is an popular alternative to ARIMA model.

Data Preparation

Make sure to download the csv file from Yahoo Finance and save it in the same directory as your Jupyter notebook.

Lets import all of the needed modules and packages

# Import usefull libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from statsmodels.tsa.api import ExponentialSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose

We read the dataset using pandas.

# Collect the dataset
df = pd.read_csv('GOOG.csv')
df.head()

We can observe that the dataset contains more than one column. For further models we will use them for now lets just focus on the Date and Close column.

In case there are nulls on our dataset we replace them, but the downloaded file should be all clean data.

# Check for nulls
df.info()

Now that we are sure that all records contains non-null values we can use the Date column as our index.

# Use date column as index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Remove duplicates based on index, keep first found
df = df[~df.index.duplicated(keep='first')]
df.asfreq('D')
df.sort_index(inplace=True)

df.head()

Decomposing The Data Set

On this section we will reveal the trend, seasonality and residuals. Here is a description of each element:

  • Trend - represents the increasing or decreasing pattern over time

  • Seasonality - represents patterns that occur at regular intervals (e.g., daily, weekly, monthly, etc.)

  • Residuals - represents the noise or the random variation in the data

We can start by plotting the data so we can visually discover the trend and seasonality. We can include some vertical lines to identify each year on the data.

# Plot the data
vlines = ['2021-01-01', '2022-01-01', '2023-01-01']

df.plot(y='Close', figsize=(15,5))

for line in vlines:
  plt.axvline(x=line, color='black', linestyle='--')

plt.show()

With above plot we can see that expect for 2022, each start of the year the stock prices tend to go up. Also, on and after 2022 there seems to be more variation in between days, in other word we have more noise during this period.

It is required to identify the type of seasonality. There are two main types:

  • Additive - assumes that changes over time are consistently made by the same amount.

  • Multiplicative - suggests that the components interact in a multiplicative way. Changes in trend and seasonality are proportional to the level of the time series.

I highly recommend readying this article to learn more.

On this article we will use the seasonal_decompose module from statsmodels to plot each of the component of the time series. It is required to pass the type of seasonality and by the previous plot there is a clear tendency of add to the next values.

# Plot the decompose item
series = df[['Close']].copy()
decompose_result_mult = seasonal_decompose(series, model="additive", period=365)


decompose_result_mult.plot()

decompose_result_mult.plot() includes the actual data plot, the trend line figure, the seasonal discomposing plot and the residuals plot.

Each of the plot helps us identifying the component of the time series that will be useful for further analysis. Lets train a model.

Training The Model

Starting from defining the future horizon. This contains the future dates we want to predict once the model is trained.

future = pd.date_range('2024-01-01', '2025-01-01', freq='D')

Now we define the Exponential Smoothing model by calling the ExponentialSmoothing() class

# Define the Exponential Smoothing model
model = ExponentialSmoothing(df['Close'], 
                                seasonal_periods=365,
                                trend='add',
                                seasonal='add').fit(optimized=1)

We can get the fitted values from the model with mode.fittedvalues. This will return the values the model predict on the training data. Lets create a column to store those values an compare.

# Fit the model
df['fitted_values'] = model.fittedvalues

df.head()

If we plot both values we can compare how good the model is learning from the training data.

# Compare fitted values and actual values

df['fitted_values'].plot(style='o', color='red')
df['Close'].plot(figsize=(10,5), color='blue')
plt.show()

Evaluate The Model

There are many ways to evaluate a model, using metrics but we start with the basic on this notebook, so lets keep it simple with visual inspection. In future article we will improve the evaluation techniques.

First predict for the forecast horizon which is 180 days.

# Forecast 180 days
pred = model.forecast(180)
pred

Pandas data frames are really useful to visualize the predictions, so lets create a data frame.

# Concat forecast and fitted values
future = pd.date_range(df.index.max(), df.index.max() + pd.DateOffset(days=179), freq='D')

forecast = pd.DataFrame({'Date': future, 'pred': pred})

# Use date column as index
forecast['Date'] = pd.to_datetime(forecast['Date'])
forecast.set_index('Date', inplace=True)

forecast

Now we plot the predictions

# Plot prediction values

plt.figure(figsize=(20,5))
plt.title('Forecast vs Actuals')
plt.plot(forecast['pred'], '--o', color='red')

plt.show()

If we visually inspect the predicted forecast we can see that the model did it best to understand the trend of the time series.

# Plot both values

plt.figure(figsize=(20,5))
plt.title('Forecast vs Actuals')
plt.plot(forecast['pred'], '--o', color='red')
plt.plot(df['Close'], '-o')

plt.show()

Next Steps

Exponential Smoothing is really good when it comes to univariate time series forecasts as it learns from the trend, seasonality and residuals. This firsts article did not focused on statistical methods to validate the output of the forecast. We will explore more in the next one.

Meanwhile, I highly recommend investigating:

  • Measures to evaluate forecasting models

  • What other models can be used for multivariate datasets?

Here is the full notebook for the code.

Enjoy training !!

0
Subscribe to my newsletter

Read articles from Abraham Luna directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abraham Luna
Abraham Luna

I am an Industrial Engineer, enthusiast in Data Science, Machine Learning and programming in general.