NUS - National University of Singapore

06/04/2026 | News release | Distributed by Public on 06/04/2026 00:38

The engine behind the AI revolution: NVIDIA’s chief scientist William Dally on what really drives computing forward

04
June
2026
|
14:18
Asia/Singapore

The engine behind the AI revolution: NVIDIA's chief scientist William Dally on what really drives computing forward

NUS120 Distinguished Speaker Series_William Dally_1
In his lecture, Dr William Dally, NVIDIA's Chief Scientist and Senior Vice President of Research, traced the evolution of deep learning and AI, highlighting the critical role of computer hardware - particularly the GPUs (graphics processing units) developed by NVIDIA - in enabling the AI revolution, as he outlined future directions for GPU design.
Download Media Kit
Preparing your download...
Download

An error occurred while preparing your download

While attention is often placed on the algorithms and models that enable AI to advance, much credit is also owed to hardware, specifically graphics processing units (GPUs).

They are designed to efficiently process a large volume of data simultaneously, making them essential for training AI models, said Dr William Dally, Chief Scientist and Senior Vice President of Research at American technology company NVIDIA, at the NUS120 Distinguished Speaker Series.

Delivering a lecture on "Shaping the Future Through Computing Innovation" on 20 May 2026, the renowned computer scientist noted that algorithms behind deep learning were largely developed in the 1980s, and the data needed to train AI systems was available by the late 2000s.

What was lacking at that time was enough computing power to put the two together. That missing ingredient was the GPU, a chip originally designed for video games, said Dr Dally, who has shaped the foundations of today's AI revolution through his pioneering work in parallel computing and high-performance interconnects.

"Think of that (data and algorithms) as a fuel in the air, and they were waiting for a spark to ignite them and really light off the AI revolution," he told some 180 students, faculty, alumni, and members of the public.

Behind every frontier AI model lies a coordination issue: getting thousands of processors to work in concert, communicating at speed and scale.

"Dr Dally's work on high-performance interconnects has been instrumental in making that coordination feasible for the large-scale systems that define the AI we see today," added Prof Tan.

That spark was sufficient computing power, exemplified by the landmark AlexNet model in 2012, which was trained on just two NVIDIA GPUs. Since then, the computing power required to train a state-of-the-art AI model has grown by roughly 10 million times. A single NVIDIA GPU has become around 5,000 times more powerful over the same period.

But with AI rapidly permeating all aspects of life, the demand for computing power to fuel AI has grown to unprecedented levels.

"As models grow more ambitious, the infrastructure behind them has become as consequential as the ideas within them. This tension is reshaping how systems are designed," said NUS President Professor Tan Eng Chye in his opening remarks.

Behind every frontier AI model lies a coordination issue: getting thousands of processors to work in concert, communicating at speed and scale.

"Dr Dally's work on high-performance interconnects has been instrumental in making that coordination feasible for the large-scale systems that define the AI we see today," added Prof Tan.

NUS120 Distinguished Speaker Series_William Dally_3
Delivering the opening remarks, NUS President Professor Tan Eng Chye lauded Dr Dally's foundational contributions to parallel computing and high-performance interconnects, noting that they have helped shape the infrastructure underpinning modern AI.
Download Media Kit
Preparing your download...
Download

An error occurred while preparing your download

Getting more for less

To illustrate how AI models process information, Dr Dally offered a simple example. When a person asks a large language model (an AI system trained on vast amounts of text to learn patterns in language), "Is an apple a fruit?", the system reads every word at once.

But when the AI model generates the answer, it produces one word at a time, and each word requires reading the entire model's stored knowledge before the next one can appear. With today's largest models containing trillions of stored values, that adds up fast.

One way to improve the efficiency is to make those stored values lighter, noted Dr Dally, who was a professor at the Massachusetts Institute of Technology and Stanford University before joining NVIDIA in 2009.

Computers represent numbers as strings of 0s and 1s, called bits - more bits means greater precision, but also more energy per calculation.

As AI models can function with cruder approximations than traditional computing, NVIDIA progressively reduced the number of bits used: from 32 to 16, then eight, and now four. Each step roughly quadrupled energy efficiency.

To mitigate the risk of reduced accuracy caused by the accumulation of rounding errors, Dr Dally shared how a technique called sparsity can be applied. Zeroes are identified and skipped during calculation, effectively doubling the useful work done per unit of energy - a gain that has been built into every NVIDIA GPU since the Ampere generation, introduced in 2020.

NUS120 Distinguished Speaker Series_William Dally_2
In a Q&A moderated by Professor Tulika Mitra, Vice Provost (Special Projects) and Dean of NUS Computing, Dr Dally noted that hardware and systems design may become more valuable in the AI era because much of it still relies heavily on human creativity. While software opportunities remain, he foresees the work increasingly shifting towards supervising AI coding agents.
Download Media Kit
Preparing your download...
Download

An error occurred while preparing your download

The next frontier

Despite the substantial improvement in a single GPU, it still falls far short of what the most demanding AI models require.

Dr Dally explained how modern AI workloads are distributed across many GPUs at once, with different processors handling different parts of a model. This requires fast, reliable links between chips. NVIDIA's NVLink technology, for instance, allows GPUs within a single cabinet to exchange data at around 1.8 terabytes per second.

However, he noted that hardware improvements can only go so far without software to match. For example, when NVIDIA optimised its software after a new chip launch, performance improved between two-and-a-half and three times - on identical hardware.

The lecture was followed by a Q&A moderated by Professor Tulika Mitra, Vice Provost (Special Projects) and Dean of NUS Computing, with audience questions ranging from the trade-offs between specialised and general-purpose hardware to the outlook for computing formats beyond 4-bit.

Looking ahead, Dr Dally speculated that AI might one day help design future chips, reducing a process that currently requires thousands of man-years of engineering work.

"I would love to see people look at applying AI to reducing the amount of energy required to turn out a new GPU," he said.

MORE ON THIS TOPIC

'Godfathers of AI' Yoshua Bengio and Yann LeCun weigh in on potential of human-level AI, emerging risks and future frontiers at NUS lectures

Experience beats knowledge: Prof Richard Sutton on reinforcement learning and the future of AI

NUS - National University of Singapore published this content on June 04, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on June 04, 2026 at 06:38 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]