06/17/2026 | News release | Distributed by Public on 06/17/2026 03:35
When Professor HAN Bo (Associate Professor, Department of Computer Science) looks at the trajectory of artificial intelligence, he sees extraordinary promise and a troubling blind spot. Foundation models-those massive systems powering everything from chatbots to clinical diagnostics-have demonstrated remarkable capabilities. But as they migrate into healthcare, finance, and autonomous platforms, a pressing question emerges: Can we trust them?
"Trustworthiness can no longer be an afterthought," says Professor Han, whose recent position article in IEEE Intelligent Systems offers a comprehensive framework for trustworthy machine learning. "It must be an integral design goal."
His analysis examines four essential pillars: learning, reasoning, planning, and multimodality.
Learning is how foundation models acquire knowledge. The standard recipe involves pretraining on massive datasets followed by fine-tuning. But Professor Han points to a counterintuitive finding: high-quality, small-scale datasets often outperform low-quality, large-scale ones. The real culprits are noisy and biased data.
Reinforcement learning presents even greater challenges. It demands tedious hyperparameter tuning, meticulous reward design, and countless iterations. One rarely discussed element is unlearning-deliberately reducing a model's probability of producing certain responses. "Researchers have observed contradictory effects," Professor Han notes. "Some see impaired generalization. Others argue unlearning steers predictions toward more promising outputs." Without proper tuning, even minor data perturbations can trigger optimisation collapse.
Foundation model reasoning has evolved from simple pattern matching to structured deduction. Methods include prompt-based techniques like chain-of-thought, post training approaches, and external tools such as calculators and verified knowledge databases.
But threats abound. Adversarial prompts can force harmful reasoning patterns. Models must also handle noisy, incomplete information-the real-world default. Beyond safety lies explainability. "Rather than black-box reasoning, the future demands auditable, human-understandable logic pathways," Professor Han argues. When humans can trace where a logical chain went wrong, trust becomes possible.
For safety-critical settings-autonomous vehicles, surgical robots-foundation models alone are insufficient. Their opaque reasoning and lack of formal guarantees make them unreliable. Professor Han advocates for neurosymbolic AI: the marriage of neural adaptability with symbolic verifiability.
"Symbolic representations decompose goals and enforce constraints," he explains. "Neural models process sensory inputs. Together, they support formal guarantees and human-readable reasoning." Systems like SayCan already demonstrate this approach, using language models to map goals to plans while symbolic controllers check feasibility.
Looking ahead, learning symbolic abstractions directly from foundation models-rather than hand-designing them-will enable generalizable planning while maintaining traceable reasoning.
Multimodality may be the most critical step toward artificial general intelligence. Models like GPT-4V can process images, audio, and text simultaneously. But this richness cuts both ways.
"Multimodality can amplify bias dramatically," Professor Han warns. "A model might learn to associate 'programmers' with images of men, fusing linguistic biases with visual stereotypes into deeply embedded prejudice." Hallucination becomes another danger-models generating descriptions of objects absent from visual data. And for embodied AI, perceptual errors can have catastrophic physical consequences, such as an industrial robot misidentifying a human arm for a part.
Professor Han's central imperative is clear: trustworthiness cannot be retrofitted. It must be infused from the ground up. This requires moving formal guarantees, neurosymbolic mechanisms, and value-sensitive optimisation to the centre of machine learning.
"Establishing trustworthy machine learning will require intensive interdisciplinary synergy, robust analytical methodologies, and scalable engineering approaches," he writes. The goal is intelligent systems that are powerful, principled, transparent, and aligned with human values.
As foundation models move from research labs into high-stakes decision making, the question is no longer whether we can build them. It is whether we can trust them. Professor Han's framework offers a roadmap for ensuring the answer is yes.
Full research paper: https://ieeexplore.ieee.org/abstract/document/11278237
Department of Computer Science
Professor Han's research profile: https://scholars.hkbu.edu.hk/en/persons/BHANML