noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

Synaptics Incorporated

Bringing Intelligence to the Edge: The Grinn–Synaptics Partnership
United States Postal Inspection[...]

NEW Counterfeit Postage Reporting System (CPRS) – Report...
Federal Government of Brazil

Justiça determina bloqueio de site que vende dados pessoais de brasileiros

Agora Inc.

09/23/2025 | News release | Distributed by Public on 09/24/2025 04:07

The Open-Source VAD Upgrade That Benefits Billions of Live Stream Viewers

There are already more than a million streamers worldwide. But with different accents, speaking speeds, and the use of specialized jargon, viewers can struggle to understand the content. In addition, when streamers want to reach a global audience, subtitles provide immense convenience. LiveCap, a software created by developer Hakase Shojo, was built to solve this exact problem. It generates subtitles in real time, supporting both Japanese and English to help viewers follow along and bridge language gaps.

While real-time subtitling might look straightforward, it comes with significant technical challenges in practice:

Subtitles often lag behind the streamer's voice.
Speech-to-text accuracy is easily affected by the environment and speaking style.
Silent segments get pushed into the speech recognition model, wasting unnecessary resources.

To deliver a smoother, more reliable experience, LiveCap needed a better way to handle audio.

Evolution From Silence to Speech

Early versions of LiveCap relied on "silence detection", a technique that identified a pause in speech to determine the end of a sentence before sending the audio to the recognition model. The problem was the delay: subtitles could only begin generating after a pause, creating a frustrating gap between what the streamer said and what appeared on screen.

The solution was Voice Activity Detection (VAD). This technology continuously identifies human speech, a far more efficient approach. But even here, not all tools are created equal.

Initially, LiveCap used Silero VAD, but it would often cut off the ends of sentences and produce unnatural, confusing transcripts. After multiple tests and comparisons, Hakase shojo switched to the open-source project TEN VAD.

The results were remarkable. TEN VAD offered faster, more accurate detection and proved incredibly stable in Japanese environments. LiveCap fully replaced Silero VAD with TEN VAD, and the false detections dropped from a staggering 67% to under 5%.

How TEN VAD Empowered LiveCap

More accurate speech detection: TEN VAD consistently and accurately detects human voice, even in challenging scenarios like the inflections of Japanese sentence endings, dramatically reducing false detections.

Ultra-low latency: With its rapid response time, TEN VAD is a perfect fit for real-time applications. It sharply identifies speech start and end points, keeping subtitles almost perfectly in sync with the streamer's voice and improving the viewing experience.

Lightweight and resource-efficient: The model is compact and consumes minimal CPU and memory. By detecting silence and noise, it avoids wasting resources on irrelevant audio.

Foundation for Downstream Tasks: LiveCap's speech recognition model requires audio chunks under five seconds. TEN VAD helps by splitting longer speech into precise sub-segments, enabling more stable and accurate transcription in real time.

"By integrating TEN VAD, LiveCap achieved much more natural transcripts in Japanese, reducing user frustration and increasing trust in the product during live usage. " Hakase shojo commented. He also noted that VAD-related technical details are rarely discussed.

But as this story shows, these seemingly simple technical details often hold the key to boosting a product's performance. By openly sharing these behind-the-scenes stories, Hakase shojo is not only providing streamers with a powerful tool but also offering valuable insights to fellow voice AI developers: the right tool choice to match the scenario, is the fastest way to solve technical challenges.

Beyond Subtitles: A Foundational Technology

The power of TEN VAD extends far beyond live subtitling. Its benefits are applicable to a wide range of real-time voice scenarios:

In AI customer service, it enables faster responses to customer inquiries.

In AI Tutor, it can accurately detect even the briefest, most hesitant utterances from a user.

In short, TEN VAD is a foundational capability for building real-time voice applications-whether for live streaming, conversational AI, or voice agents.

TEN VAD: https://github.com/TEN-framework/ten-vad
LiveCap: https://store.steampowered.com/app/3529970/LiveCap/

Agora Inc. published this content on September 23, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 24, 2025 at 10:07 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]