Content AI: Voice AI Takes a Step Forward
By Adam Pease
Content AI: Voice AI Takes a Step Forward
Recent announcements by large technology providers have shed light on just how far the market for voice synthesis and voice simulation has come.
From Microsoft to Apple, a variety of market leaders have announced impressive accomplishments in voice AI that suggest we are on the verge of naturalistic simulation of the human voice.
This blog discusses these recent announcements and what they mean for the market in general.
AI Gains Its Voice
Last week, Microsoft announced its VALL-E speech synthesizer model, which facilitates text-to-speech generation that simulates the voice of a given speaker.
According to Microsoft, VALL-E can produce a highly realistic simulated voice when it is primed with just 3 seconds of sample material. While robotic intonations are still present in VALL-E’s output, it impressively mirrors its input data, creating a vocal deepfake that requires very little data.
Apple made similar strides in its voice synthesis technology as well, though moving in a slightly different direction. Rather than releasing a model that simulates input voices, Apple recently announced a suite of professional-grade digital narrator voices, which it now offers to authors as a replacement for human audiobook narrators.
What Is the Future of Voice?
While large providers making strides in voice synthesis is sure to receive media coverage, many small providers have also been at work building believable voice synthesis.
In some cases, these offerings can even exceed those of major providers, as specialists are able to focus exclusively on their voice product offerings.
Along these lines, we expect the market for voice to continue to expand. Moves by large providers signal that the pressure is on to deliver enterprise-grade voice solutions, which may find use in contact centers, sales contexts, marketing content, and more.
Aragon feels we are on the verge of voice AI technologies that are indistinguishable from real human voices, a shift that will no doubt have disruptive implications for many markets.
Bottom Line
The ability to synthesize convincingly human voice content will transform many industries as organizations realize they no longer need to depend on human agents to create customer service experiences.
Moves by providers like Microsoft and Apple suggest that the standards are shifting for voice AI, and that many customers and users may soon expect a much more high-fidelity experience from virtual agents.
Learn more about the emerging technologies you need to know for 2023!
With CEO & Lead Analyst, Jim Lundy and VP of Research, Betsy Burton for a complimentary webinar on the
“Top Ten Technologies for 2023 and Beyond.”
This blog is a part of the Content AI blog series by Aragon Research’s Analyst, Adam Pease.
Missed the previous installments? Catch up here:
Blog 1: RunwayML Foreshadows the Future of Content Creation
Blog 2: NVIDIA Enters the Text-to-Image Fray
Blog 3: Will OpenAI’s New Chatbot Challenge Legacy Search Engines?
Blog 4: Adobe Stock Accepts Generative Content and Meets Backlash
Blog 5: OpenAI Makes a Move for 3D Generative Content with Point-E
Blog 6: ChatGPT and the Problem of Detecting AI-Generated Content
Have a Comment on this?