IBM - International Business Machines Corporation

09/15/2025 | News release | Distributed by Public on 09/16/2025 11:32

The hidden incentives driving AI hallucinations

That uncertainty has led the authors to focus on evaluation. One option is to tweak training, so models get less punishment for saying "I don't know." But Vempala warned that this could break the very balance that makes them sound fluent.

"A potential change would be to penalize 'IDK' less than incorrect next-token prediction during pre-training, but this might have other undesirable consequences," Vempala said. "Since we do not fully understand why pre-training with next-token prediction and standard log loss works so well to generate entire documents, it is unclear if such a change in the objective might reduce overall performance."

Fixing benchmarks may help, but Soule said that won't solve everything. "There are always going to be hallucinations," she said on the Mixture of Experts podcast. "We are going to need a combination of tools, symbolic approaches, and verification layers on top of the models to detect when a statement lacks evidence in the grounding context."

The study lands as the industry races to cut hallucinations. IBM researchers are also exploring new approaches to the problem. One project, called Larimar, is designed to give models a form of short-term, editable memory. The idea is to allow AI systems to revise or discard information in real time rather than carry it forward indefinitely. That flexibility could reduce the risk of errors compounding or persisting, and it may help models stay accurate without requiring developers to engage in the costly process of retraining from scratch.

Larimar builds on the observation that current systems lack mechanisms to update specific facts once training is complete. By introducing a layer of memory that can be edited, the approach enables models to adjust to new or corrected information as they operate.

Payel Das, a Principal Research Staff Member and Manager of Trusted AI at IBM Research, described Larimar as a way of aligning model performance more closely with how humans remember, revise and sometimes forget.

"Models today are static and brittle," Das told IBM Think in an interview. "You can't teach them something mid-conversation or update their understanding without retraining them entirely. Larimar is a step toward making them more flexible."

Hallucinations aren't going away. But with new tools like Larimar and a better understanding of how training incentives fuel bluffing, researchers are finding ways to keep them in check.

IBM - International Business Machines Corporation published this content on September 15, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 16, 2025 at 17:32 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]