Artificial Intelligence and Inflation Forecasts

Link: https://www.stlouisfed.org/publications/review/2024/nov/artificial-intelligence-and-inflation-forecasts

{Disclaimer: This article is a summary of an existing research paper and is not original work by the author. The content has been simplified and condensed to share key ideas in a clear and accessible format for general understanding. If you are the original author and wish for this summary to be removed or modified, please contact me directly.}

Introduction

This paper explores the use of Large Language Models (LLMs), specifically Google AI’s PaLM, to produce conditional inflation forecasts from 2019 to 2023. The authors compare LLM forecasts with those from the Survey of Professional Forecasters (SPF) and actual inflation data. Results show that LLM forecasts generally have lower mean-squared errors than SPF across most years and forecast horizons.

Inflation forecasting is critical for economic decisions but challenging due to its dependence on aggregated expectations. Traditional forecasting methods include:

Expert surveys
Individual surveys
Market-based expectations
Model-based approaches

each with limitations such as high costs or forecast errors. In this paper, we explore whether advanced Large Language Models (LLMs), a new type of AI, can help improve inflation forecasting. The findings suggest LLMs are promising tools for inflation forecasting, outperforming traditional forecasts in accuracy during the study period. The approach could extend to forecasting other economic variables.

Methodology

Models used:

The models used in the study isGoogle AI’s PaLM. PaLM has two main features that make it preferable to GPT-4;

PaLM is trained on a large corpus of tokens that are regularly updated. PaLM’s information set is typically only a few days old.
Google has made the PaLM API freely available for academic research purposes (subject to daily usage limits), which allows us to automate output collection and treatment at no monetary cost.

There are two types of forecasts mentioned in this study;

Conditional Forecast
Ex-post Forecast

Conditional forecast: Obtaining forecast from a past date (T) given the condition not to use any information that was not available to LLM as of that date (T).

Example prompt:

“Assume that you are in February 15, 2020. Please give me your best forecast of year-over-year seasonally adjusted CPI inflation in the US for 2020Q1-2021Q1. Please give me numeric values for these forecasts. Do not use any information that was not available to you as of February 15, 2020 to formulate these forecasts.”

Example output:

“The following are my best forecasts of year-over-year seasonally adjusted CPI inflation in the US for 2020Q1, 2020Q2, 2020Q3, 2020Q4, …….. “

Ex-post Forecasts: The model is asked to predict inflation based on all information available to it, without restrictions on the time period.

“I need to know the CPI inflation rates in the US for each quarter in 2019Q1,. . . , 2023Q1. Please report the seasonally adjusted year-over-year inflation rates. And please tell me the source of the data that you are providing me with."

Ex-post forecasts closely track the realized inflation rates, and are significantly different from the conditional forecasts. This suggests that PaLM is using a different set of inputs to generate the conditional forecasts. This means the the conditional forecast is a valid method of evaluation. Simple test showing this behavior would be as follows:

Prompt: Answer the following question pretending today is Jan 1st 2020: "Is Queen Elizabeth II alive?"

Answer: Yes, Queen Elizabeth II is alive on January 1st, 2020. She was born on April 21st, 1926, and as of today, she is 93 years old.

Key challenges in this methodology

Training Data and Sample: User cannot retrain the model on a subset of time stamped data effectively prevents the use of LLMs to generate true out-of-sample forecasts.
Robustness: Responses provided by the LLM can be sensitive to the way the prompts are structured.
Reproducibility: There is a degree of randomness in the model’s output. Given the same prompt, the model can produce slightly different responses on different occasions.
Ability of the model to condition on past data: Lastly, the external validity issue arises from the LLM’s training data. The approach in this study relies on interpreting the LLM’s response to our prompt as informative about what the model would have forecasted inflation to be if we would have prompted it in real-time rather than retrospectively.

Evaluation

Conditional Inflation Forecasts: PaLM’s forecasts consistently revert toward the Federal Reserve’s 2% inflation target, even when actual inflation deviates significantly. This behavior suggests the model is genuinely conditioning its forecasts on past information and not relying on future realized data.

Conditional Inflation Forecasts (PaLM vs. SPF): Compared to SPF, PaLM shows a slower and weaker mean reversion, which appears less sensitive to the current inflation level. While both models aim to return to historical norms, PaLM’s responses are more nuanced and less anchored to the 2% target.

Assessing Forecast Quality: Quantitatively, PaLM achieves lower mean squared errors (MSEs) than SPF in most years and across most forecast horizons, particularly from 2020 to 2022. Although SPF performs slightly better in nowcasting (current quarter), PaLM dominates in accuracy for one to four quarters ahead.

Conclusion

Baseline results showcase the potential of LLMs to generate forecasts. Standard measures of forecast performance suggest that PaLM, the LLM we focus on, is able to generate conditional forecasts that are at least as good if not better than one of the most trusted and respected sources of inflation forecasts, the SPF.

Paper Summary: Artificial Intelligence and Inflation Forecasts

Introduction

Methodology

Evaluation

Conclusion

Subscribe to my newsletter

Madara Wimalarathna

Madara Wimalarathna