Understanding Stationarity, Dickey-Fuller Test in Time Series Analysis: An In-Depth Guide

In time series analysis, stationarity is a fundamental concept that significantly impacts the modeling and forecasting of data. This guide will delve into the concepts of stationarity and non-stationarity, the Dickey-Fuller test for stationarity, methods to convert non-stationary data to stationary, and techniques to detect autocorrelation and seasonality. We'll include practical Python code snippets to illustrate these concepts.


Table of Contents

  1. Introduction to Stationarity

  2. Non-Stationarity

  3. Dickey-Fuller Test

  4. Converting Non-Stationary Series to Stationary

  5. Detecting Autocorrelation and Seasonality

  6. Practical Implementation with Python

  7. Conclusion

  8. References


Introduction to Stationarity

What is Stationarity?

A stationary time series is one whose statistical properties such as mean, variance, and autocorrelation are constant over time. In other words, the time series does not exhibit trends, seasonality, or other structures that change over time.

Types of Stationarity

  • Strict (Strong) Stationarity: All statistical properties of the time series are invariant to time shifts.

  • Weak (Second-Order) Stationarity: The mean and variance are constant over time, and the covariance between two time periods depends only on the lag between them, not on the actual time at which the covariance is computed.

Why is Stationarity Important?

  • Modeling Assumptions: Many time series modeling techniques, such as ARIMA, assume that the underlying series is stationary.

  • Predictability: Stationary series are easier to predict because their statistical properties are stable over time.

  • Statistical Inference: Non-stationary data can lead to invalid or misleading statistical inferences.


Non-Stationarity

Causes of Non-Stationarity

  • Trends: Long-term increase or decrease in the data.

  • Seasonality: Regular patterns repeating over time.

  • Structural Breaks: Changes in the underlying process generating the data.

  • Heteroscedasticity: Variance of the series changes over time.

Examples of Non-Stationary Series

  • Economic Indicators: GDP, inflation rates, stock prices.

  • Environmental Data: Temperature readings over decades.

  • Social Data: Population growth over time.


Dickey-Fuller Test

What is the Dickey-Fuller Test?

The Dickey-Fuller (DF) test is a statistical test that checks for the presence of a unit root in a univariate time series sample. The presence of a unit root indicates that the time series is non-stationary.

Why and When to Use It

  • Purpose: To determine whether differencing is required to achieve stationarity.

  • When to Use: Before fitting models that assume stationarity, such as ARIMA.

Interpreting the Results

  • Null Hypothesis (( H_0 )): The time series has a unit root (non-stationary).

  • Alternative Hypothesis (( H_1 )): The time series is stationary.

  • Decision Rule:

    • If the p-value is less than the significance level (e.g., 0.05), reject ( H_0 ).

    • If the test statistic is less than the critical value, reject ( H_0 ).


Converting Non-Stationary Series to Stationary

Differencing

  • Definition: Subtracting the previous observation from the current observation.

  • Higher-Order Differences: Applying differencing multiple times if needed.

Detrending

  • Definition: Removing the underlying trend component from the time series.

  • Methods:

    • Subtracting the Trend Line: Fit a regression line and subtract it.

    • Moving Average Smoothing: Use moving averages to estimate the trend.

Deseasonalizing

  • Definition: Removing the seasonal component from the time series.

  • Methods:

    • Seasonal Decomposition: Using methods like STL (Seasonal-Trend Decomposition using Loess).

    • Seasonal Differencing: Differencing the series at seasonal lags.


Detecting Autocorrelation and Seasonality

Autocorrelation Function (ACF)

  • Purpose: Measures the correlation between the time series and its lagged values.

  • Usage: Identify the presence of autocorrelation and seasonal patterns.

Partial Autocorrelation Function (PACF)

  • Purpose: Measures the correlation between the time series and its lagged values, controlling for the effects of intermediate lags.

  • Usage: Helps in identifying the order of autoregressive terms.

Seasonal Decomposition

  • Purpose: Decompose the time series into trend, seasonal, and residual components.

  • Methods: Additive or multiplicative models, STL decomposition.


Practical Implementation with Python

Let's apply these concepts using Python libraries such as pandas, numpy, matplotlib, and statsmodels.

Data Preparation

Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical libraries
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose

Generate Synthetic Non-Stationary Data

We'll create a time series with trend and seasonality.

# Create date range
dates = pd.date_range(start='2010-01-01', periods=120, freq='M')

# Generate data components
np.random.seed(42)
trend = np.linspace(10, 50, 120)
seasonality = 10 * np.sin(np.linspace(0, 3 * np.pi, 120))
noise = np.random.normal(0, 2, 120)

# Combine components
data = trend + seasonality + noise

# Create DataFrame
df = pd.DataFrame({'Date': dates, 'Value': data}).set_index('Date')

Visualize the Time Series

plt.figure(figsize=(12, 6))
plt.plot(df['Value'], label='Time Series')
plt.title('Non-Stationary Time Series with Trend and Seasonality')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

Testing for Stationarity

Perform Dickey-Fuller Test

def adf_test(series, title=''):
    """
    Perform Augmented Dickey-Fuller test.
    """
    print(f'Augmented Dickey-Fuller Test: {title}')
    result = adfuller(series.dropna(), autolag='AIC')
    labels = ['Test Statistic', 'p-value', '# Lags Used', '# Observations Used']
    out = pd.Series(result[0:4], index=labels)
    for key, val in result[4].items():
        out[f'Critical Value ({key})'] = val
    print(out.to_string())
    if result[1] <= 0.05:
        print("=> Reject the null hypothesis. The series is stationary.")
    else:
        print("=> Fail to reject the null hypothesis. The series is non-stationary.")

adf_test(df['Value'], 'Original Series')

Interpretation

The p-value is likely to be greater than 0.05, indicating non-stationarity.

Converting to Stationary

Differencing

First Difference
df['First Difference'] = df['Value'] - df['Value'].shift(1)
adf_test(df['First Difference'], 'First Difference')
Seasonal Difference
df['Seasonal Difference'] = df['Value'] - df['Value'].shift(12)
adf_test(df['Seasonal Difference'], 'Seasonal Difference')
First Seasonal Difference
df['First Seasonal Difference'] = df['First Difference'] - df['First Difference'].shift(12)
adf_test(df['First Seasonal Difference'], 'First Seasonal Difference')

Detrending

Using Linear Regression
from sklearn.linear_model import LinearRegression

# Prepare data
df['Time'] = np.arange(len(df.index))
X = df[['Time']]
y = df['Value']

# Fit linear regression
model = LinearRegression()
model.fit(X, y)

# Calculate trend
df['Trend'] = model.predict(X)

# Detrend
df['Detrended'] = df['Value'] - df['Trend']

adf_test(df['Detrended'], 'Detrended Series')

Deseasonalizing

Using Seasonal Decomposition
# Decompose the time series
decomposition = seasonal_decompose(df['Value'], model='additive', period=12)

# Extract seasonal component
df['Seasonal'] = decomposition.seasonal

# Deseasonalize
df['Deseasonalized'] = df['Value'] - df['Seasonal']

adf_test(df['Deseasonalized'], 'Deseasonalized Series')

Detecting Autocorrelation and Seasonality

Plot ACF and PACF

# Original Series
fig, ax = plt.subplots(2, 1, figsize=(12, 8))
plot_acf(df['Value'].dropna(), lags=50, ax=ax[0])
plot_pacf(df['Value'].dropna(), lags=50, ax=ax[1])
plt.tight_layout()
plt.show()

Interpretation

  • ACF Plot: Shows high autocorrelation at various lags, indicating non-stationarity and seasonality.

  • PACF Plot: Helps determine the order of autoregressive terms.


Conclusion

  • Stationarity is crucial for time series modeling as many models assume the series is stationary.

  • Dickey-Fuller Test helps determine whether a series is stationary.

  • Differencing, detrending, and deseasonalizing are techniques to convert a non-stationary series into a stationary one.

  • Autocorrelation and seasonality can be detected using ACF and PACF plots, as well as seasonal decomposition.

By transforming the data to achieve stationarity and understanding its underlying components, we can build more accurate and reliable forecasting models.


References

  • Time Series Analysis by James D. Hamilton

  • Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos

  • Statsmodels Documentation: Time Series Analysis


1
Subscribe to my newsletter

Read articles from Sai Prasanna Maharana directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sai Prasanna Maharana
Sai Prasanna Maharana