OpenAI’s Voice Engine has been introduced as a new text-to-speech technology, capable of generating a synthetic voice from just a 15-second audio sample of an individual’s voice. This innovative tool can vocalize text prompts as requested, either in the original language of the recorded voice or in various other languages.
“These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI stated in its blog post.
Among the organizations granted early access are Age of Learning, a company specializing in educational technology; HeyGen, a platform for visual storytelling; Dimagi, a developer of healthcare software for field workers; Livox, which produces an AI-powered communication application; and Lifespan, a healthcare network.
How good is Voice Engine by OpenAI?Now, we will present one reference audio along with three samples generated by OpenAI, accompanied by their respective transcripts. It’s up to you to determine the effectiveness of OpenAI’s Voice Engine by considering the shared examples. However, a definitive assessment cannot be made until the feature is widely available to end-users.
https://dataconomy.com/wp-content/uploads/2024/04/reference.mp3OpenAI announced the development of its Voice Engine technology in late 2022, highlighting its application in providing preset voices for text-to-speech APIs and enabling the Read Aloud feature in ChatGPT. Recently, OpenAI product team mentioned that the technology was refined using both licensed data and data that’s publicly accessible. OpenAI has indicated that initially, this technology will be accessible to roughly 10 developers.
The field of AI-driven text-to-audio conversion is rapidly advancing. While the majority of developments have been in creating instrumental or environmental sounds, the creation of synthetic voices has seen less activity, a situation OpenAI attributes to the ethical concerns involved. Some entities active in this domain include Podcastle and ElevenLabs.
OpenAI has confirmed that its collaborators have committed to adhering to its use policies, which preclude the use of Voice Generation for impersonating individuals or entities without consent. Furthermore, these agreements stipulate that collaborators must obtain clear and voluntary consent from the people whose voices are used, prevent users from generating voices independently, and inform listeners that the voices are synthesized by AI. To ensure the traceability of its audio outputs, OpenAI has incorporated watermarking into the sound clips and is vigilant in overseeing their utilization.
OpenAI announced the development of its Voice Engine technology in late 2022 (Image credit)OpenAI proposed a series of measures aimed at mitigating potential risks associated with technologies of this nature. These include transitioning away from the use of voice-based verification for banking access, implementing regulations to safeguard individuals’ voice data in AI applications, enhancing public awareness about AI-generated deepfakes, and creating mechanisms for the monitoring of AI-generated content.
“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,” OpenAI told.
Use cases for OpenAI’s Voice Engine featureOpenAI suggests that the below use cases of Voice Engine are viable examples of its application, yet emphasizes that the true limit to its potential uses is only bounded by one’s imagination:
Featured image credit: Kerem Gülen/Midjourney