Federal Reserve Bank of San Francisco

02/05/2026 | Press release | Distributed by Public on 02/05/2026 11:03

ChatMacro: Evaluating Inflation Forecasts of Generative AI*

Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions.

Suggested citation:

Alam, M. Jahangir, Shane Boyle, Huiyu Li, and Tatevik Sekhposyan. 2026. "ChatMacro: Evaluating Inflation Forecasts of Generative AI*." Federal Reserve Bank of San Francisco Working Paper 2026-04. https://doi.org/10.24148/wp2026-04

Federal Reserve Bank of San Francisco published this content on February 05, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on February 05, 2026 at 17:04 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]