Google LLC

09/11/2025 | Press release | Distributed by Public on 09/11/2025 11:28

How AI made Meet’s language translation possible

Like most Googlers, Fredric Lindstrom and Huib Kleinhout spend a lot of time talking to coworkers via Google Meet. And a lot of their meetings happen across continents, timezones and languages. Fredric and Huib are based in Sweden and Norway, respectively, and are working on Speech Translation, Google Meet's new real-time language translation feature. Speech Translation allows you to communicate with anyone on your call, even if you speak a different language. As you talk, Speech Translation automatically translates your speech in near real-time, in a voice like yours. The idea is to make sure language isn't a barrier to communication, whether you're planning a vacation and talking to someone in another country or chatting with friends or family who don't speak the same language.

Fredric, who leads the audio engineering team in Meet, has watched AI transform what his team is capable of doing. His team began working on Speech Translation about two years ago; at the time, existing models could handle offline translation, but the challenge lay in making it instantaneous - which would be necessary for live Google Meet calls. But they knew it was possible, so they began working with the Google DeepMind team. "When we started, we thought, 'Maybe this will take five years,'" Fredric explains. Two years later, here we are. "As things go with AI," he explains, "things just went faster and faster. Now, there's a whole Google community with engineers from Pixel, Cloud, Chrome and more working together with Google Deepmind to achieve real-time speech translation."

Breakthroughs in translation technology

Previous audio translation technologies relied on a multi-step process: Transcribe the speech, translate the text, then convert it back to speech. This chain resulted in significant latency, often 10-20 seconds, making natural conversation impossible. And translated voices were generic, failing to capture the speaker's unique characteristics.

The true breakthrough, as Huib (who leads product management for audio quality) explains, was thanks to "large models" - not necessarily large language models (LLMs) but models capable of "one-shot" translation. "You send audio in and almost immediately, the model starts outputting audio," he notes. This drastically reduced latency to nearly mimic how a human interpreter processes and delivers speech. "We discovered that two to three seconds was sort of a sweet spot," Huib says. Faster was difficult to understand; slower didn't lend itself to natural conversation. But once they hit this timing, it meant that using this model, translation in Google Meet can make simultaneous conversation across different languages feasible.

Problem solving and big improvements

Developing this complex feature was not without its hurdles. One of the most critical aspects was ensuring high-quality translation, which can vary greatly depending on factors like speaker accent, background noise or network conditions. Despite challenges in development, the Meet and DeepMind teams worked together to refine these hiccups, testing models and adjusting them based on real-world performance.

Part of that testing involved working with linguists and other language experts to really understand the nuances not only of translation but accents as well. Languages with closer affinities, like Spanish, Italian, Portuguese and French were easier to integrate, while structurally different languages such as German presented greater challenges due to variations in everything from grammar to common idioms. Currently, the model also translates most expressions literally, which can lead to amusing misunderstandings, Huib and Frederic note. However, they expect updates using advanced LLMs will grasp and translate such nuances more accurately, even capturing tone and irony.

From AI research to reality

Both Huib and Frederic feel similarly excited about seeing our cutting-edge AI research become a reality, especially in a project that will help such a vast, global audience. As of today, speech translation in Google Meet is now available in Italian, Portuguese, German and French. "It's been really rewarding to hear from people who are immigrants, who moved from another part of the world to the U.S. with parents or grandparents who don't speak English and have never spoken to their grandchildren," Fredric explains. "Now they have a common language. All of a sudden, they can talk to each other. This technology bridges those kinds of gaps."

POSTED IN:
  • Google Workspace
  • AI
Google LLC published this content on September 11, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 11, 2025 at 17:28 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]