Live, on stage voice cloning works at a stunning pace allowing you to impersonate the star actor or actress of your favorite movie. Since then the technology has come a long way, with systems like NVIDIA’s Jarvis that can create high fidelity voice clones in around 50 milliseconds (live use is now possible). For applications like live customer support, video games or even live streams this fast turnaround of voice data is needed to be able to not interrupt the speaker on time, which is really important for any kind of live dialogs used with real people.
There are already working live applications of real time voice cloning in several industries. At the 2019 Google I/O conference, Google even demoed its Duplex AI in action cloning voices for real-time customer service calls. This has seen Duplex catering to tasks such as booking reservations, answering queries in real time and the likes, thereby showcasing the ability of voice cloning tech to be highly functional and dependable.
It has the potential for live real-time voice cloning in a variety of contexts, including live entertainment. Recently, companies like Synthesia have even employed voice cloning during live events, where virtual personas interface directly with the participants. The reason live settings are successful is because the system has to handle such large data amounts virtually instantaneously – for example, being able to speak back what a user said in real-time to create a natural interaction. The lag time is often under 100 milliseconds, which is basically imperceptible compared with a live human voice.
On the scalability side, real-time voices systems can handle multiple conversations at the same time (which is also efficient in customer service or call center use case). This kind of massive operation takes a seamless hardware performance capable to handle thousands of voice data streams simultaneously while maintaining the quality. Systems with these voice synthesis optimized GPUs, for example, are able to manage >10 K interactions in parallel ensuring each voice clone feels as natural and bespoke as the original speaker.
Real-time voice cloning being used in live scenarios have raised security concerns. Scam phone calls received 2021 featured scammers doing impersonations of CEOs on live phone calls with cloned voices, proving the use case for an increased security protocol. While impressive, we need to be very careful with the technology so that it is not abused. Privacy of voice data for cloningPrivacy is another critical factor that must be addressed in the deployment, and both Google and OpenAI already working on cryptographic techniques to protect the voices, ensuring that no unauthorized live clone can use your voice.
With its high speed and accuracy, speech to text, Real-time voice cloning is entering this field. Learn ALL about real time voice cloning.