09/23/2025 | News release | Distributed by Public on 09/24/2025 04:07
There are already more than a million streamers worldwide. But with different accents, speaking speeds, and the use of specialized jargon, viewers can struggle to understand the content. In addition, when streamers want to reach a global audience, subtitles provide immense convenience. LiveCap, a software created by developer Hakase Shojo, was built to solve this exact problem. It generates subtitles in real time, supporting both Japanese and English to help viewers follow along and bridge language gaps.
While real-time subtitling might look straightforward, it comes with significant technical challenges in practice:
To deliver a smoother, more reliable experience, LiveCap needed a better way to handle audio.
Early versions of LiveCap relied on "silence detection", a technique that identified a pause in speech to determine the end of a sentence before sending the audio to the recognition model. The problem was the delay: subtitles could only begin generating after a pause, creating a frustrating gap between what the streamer said and what appeared on screen.
The solution was Voice Activity Detection (VAD). This technology continuously identifies human speech, a far more efficient approach. But even here, not all tools are created equal.
Initially, LiveCap used Silero VAD, but it would often cut off the ends of sentences and produce unnatural, confusing transcripts. After multiple tests and comparisons, Hakase shojo switched to the open-source project TEN VAD.
The results were remarkable. TEN VAD offered faster, more accurate detection and proved incredibly stable in Japanese environments. LiveCap fully replaced Silero VAD with TEN VAD, and the false detections dropped from a staggering 67% to under 5%.
"By integrating TEN VAD, LiveCap achieved much more natural transcripts in Japanese, reducing user frustration and increasing trust in the product during live usage. " Hakase shojo commented. He also noted that VAD-related technical details are rarely discussed.
But as this story shows, these seemingly simple technical details often hold the key to boosting a product's performance. By openly sharing these behind-the-scenes stories, Hakase shojo is not only providing streamers with a powerful tool but also offering valuable insights to fellow voice AI developers: the right tool choice to match the scenario, is the fastest way to solve technical challenges.
The power of TEN VAD extends far beyond live subtitling. Its benefits are applicable to a wide range of real-time voice scenarios:
In short, TEN VAD is a foundational capability for building real-time voice applications-whether for live streaming, conversational AI, or voice agents.