University of California, Riverside

01/21/2026 | Press release | Distributed by Public on 01/21/2026 22:22

Making AI smarter without more training data

A study led by UC Riverside researchers offers a practical fix to one of artificial intelligence's toughest challenges by enabling AI systems to reason more like humans-without requiring new training data beyond test questions.

In a pre-print paper titled "Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models," assistant professor Yinglun Zhu and students introduce a novel method called Test-Time Matching, or TTM. The approach significantly improves how AI systems interpret relationships between text and images, especially when presented with unfamiliar combinations.

Yinglun Zhu

"Compositional reasoning is about generalizing in the way humans do and understanding new combinations based on known parts," said Zhu, who led the study and is a member of the Department of Electrical and Computer Engineering at the Bourns College of Engineering. "It's essential for developing AI that can make sense of the world, not just memorize patterns."

Today's leading AI models perform well on many tasks, but they can falter when asked to align visual scenes with language under compositional stress-such as when familiar objects and relationships are rearranged and described in new ways.

Researchers use specialized tests to evaluate whether AI models can integrate concepts in the way people do. Yet models often perform no better than chance, suggesting they struggle with grasping the nuanced relationships between words and images.

Zhu's team found that existing evaluations may unfairly penalize models.

The widely used evaluation metrics now rely on isolated pairwise comparisons, imposing extra constraints that can obscure the best overall matching between images and captions, Zhu said.

To address this, the team created a new evaluation metric that identifies the best overall matching across a group of image-caption pairs. This metric improved scores and revealed previously unseen model capabilities.

(Image generated by Gemini)

Building on this insight, the researchers then developed Test-Time Matching, or TTM, a technique that allows AI systems to improve with use without any external supervision.

The method works by having the AI model predict matches between images and captions, selecting its most confident predictions. The model then fine-tunes itself using those predictions, repeating the process to refine its performance. This self-improvement process mimics how people use context to reason more effectively.

The researchers tested their method on SigLIP-B16, a relatively small vision-language model designed to understand and connect visual and textual information. With TTM, SigLIP-B16 significantly improved its performance on compositional reasoning benchmarks, achieving or exceeding previous state-of-the-art results.

In one test, TTM boosted SigLIP-B16's performance on a benchmark dataset known as MMVP-VLM to 89.4%, surpassing GPT-4.1.

"Even smaller models have the capacity for strong reasoning," Zhu said. "We just need to unlock it with better evaluation and smarter test-time methods."

The study suggests that test-time adaptation strategies like TTM could become essential as AI expands into real-world settings such as robotics, autonomous vehicles, and healthcare-domains where systems must quickly adapt to new situations.

Zhu's findings challenge the prevailing assumption that larger models are always better. Instead, he calls for rethinking how AI systems are evaluated and deployed.

"Sometimes, the problem isn't the model. It's how we're using it," he said.

The full paper, co-authored with UCR's Jiancheng Zhang and Fuzhi Tang, is available on arXiv.



(Header conceptual image/Getty Images)

Share this Article

Media Contacts

University of California, Riverside published this content on January 21, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on January 22, 2026 at 04:22 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]