Artificial intelligence

OpenAI Previews New Audio Tool That Can Recreate Voices

OpenAI, high-profile tech startup, is allowing a small group of businesses to test a new system, Voice Engine, that is able to recreate a person’s voice from a 15-second recording, OpenAI announced today.

TakeAway Points:

  • OpenAI is permitting small businesses to test its new system voice engine.
  • The demos and use cases have been shared with up to ten developers.
  • However,  the company has decided to scale back the release after receiving feedback from stakeholders such as policymakers, industry experts, educators, and creatives. 

OpenAI’s Voice Engine

OpenAI is sharing early results from a test for a feature that can read words aloud in a convincing human voice, highlighting a new frontier for artificial intelligence and raising the specter of deepfake risks.

The company is sharing early demos and use cases from a small-scale preview of the text-to-speech model, called Voice Engine, which it has shared with about 10 developers so far, a spokesperson said. OpenAI decided against a wider rollout of the feature, which it briefed reporters on earlier this month.

A spokesperson for OpenAI said the company decided to scale back the release after receiving feedback from stakeholders such as policymakers, industry experts, educators, and creatives. The company had initially planned to release the tool to as many as 100 developers through an application process, according to the earlier press briefing.


“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society, and beyond to ensure we are incorporating their feedback as we build.” the company wrote in a blog post on Friday. 

Other AI technology has already been used to fake voices in some contexts. In January, a bogus but realistic-sounding phone call purporting to be from President Joe Biden encouraged people in New Hampshire not to vote in the primaries—an event that stoked AI fears ahead of critical global elections.

Unlike OpenAI’s previous efforts at generating audio content, Voice Engine can create speech that sounds like individual people, complete with their specific cadence and intonations. All the software needs is 15 seconds of recorded audio of a person speaking to re-create their voice.

During a demonstration of the tool, Bloomberg listened to a clip of OpenAI Chief Executive Officer Sam Altman briefly explaining the technology in a voice that sounded indistinguishable from his actual speech, but was entirely AI-generated.

“If you have the right audio setup, it’s basically a human-caliber voice,” said Jeff Harris, a product lead at OpenAI. “It’s a pretty impressive technical quality.” However, Harris said: “There’s obviously a lot of safety delicacy around the ability to really accurately mimic human speech.”

Voice Engine Usecases

One of OpenAI’s current developer partners using the tool, the Norman Prince Neurosciences Institute at the not-for-profit health system Lifespan, is using technology to help patients recover their voice. For example, the tool was used to restore the voice of a young patient who lost her ability to speak clearly due to a brain tumor by replicating her speech from an earlier recording for a school project, the company blog post said.

OpenAI’s custom speech model can also translate the audio it generates into different languages. That makes it useful for companies in the audio business, like Spotify Technology SA. Spotify has already used the technology in its own pilot program to translate the podcasts of popular hosts like Lex Fridman. OpenAI also touted other beneficial applications of the technology, such as creating a wider range of voices for educational content for children.

In the testing program, OpenAI is requiring its partners to agree to its usage policies, obtain consent from the original speaker before using their voice, and disclose to listeners that the voices they’re hearing are AI-generated. The company is also installing an inaudible audio watermark to allow it to distinguish whether a piece of audio was created by its tool.

Before deciding whether to release the feature more broadly, OpenAI said it’s soliciting feedback from outside experts.

“It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not,” the company added..

OpenAI also wrote that it hopes the preview of its software “motivates the need to bolster societal resilience” against the challenges brought about by more advanced AI technologies. For example, the company called on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. It’s also seeking public education about deceptive AI content and more development of techniques for detecting whether audio content is real or AI-generated.

To Top

Pin It on Pinterest

Share This