Back in 2019, OpenAI refused to release its full research into the development of GPT2 over fears that it was “too dangerous” to release publicly. On Thursday, OpenAI’s biggest financial backer, Microsoft, made a similar pronouncement about its new VALL-E 2 voice synthesizer AI.

The VALL-E 2 system is a zero-shot text-to-speech synthesis (TTS) AI, meaning that it can recreate hyper-realistic speech based on just a few seconds of sample audio. Per the research team, VALL-E 2 “surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks.”

The system reportedly can even handle sentences that are difficult to pronounce because of their structural complexity or repetitive phrasing, such as tongue twisters.

There are a host of potential beneficial uses for such a system, like enabling people suffering from aphasia or Amyotrophic lateral sclerosis (commonly known as ALS or Lou Gehrig’s disease) to speak again, albeit through a computer, as well as use in education, entertainment, journalism, chatbots and translation, or as accessibility features and “interactive voice response systems,” like Siri. However, the team also recognizes numerous opportunities for the public to misuse its technology, “such as spoofing voice identification or impersonating a specific speaker.”

As such the AI will only be available for research purposes. “Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public,” the team wrote. ” If you suspect that VALL-E 2 is being used in a manner that is abusive or illegal or infringes on your rights or the rights of other people, you can report it at the Report Abuse Portal.”

Microsoft is hardly alone in its efforts to train computers to speak as humans do. Google’s Chirp, ElevenLabs’ Iconic Voices, and Voicebox from Meta all aim to perform similar functions.

However, such systems have come under ethical scrutiny as they have repeatedly been used to scam unsuspecting victims by emulating the voice of a loved one or a well-known celebrity. And unlike generated images, there’s currently no way to effectively “watermark” AI generated audio.

Related Posts

New study shows AI isn’t ready for office work

A reality check for the "replacement" theory

Google Research suggests AI models like DeepSeek exhibit collective intelligence patterns

The paper, published on arXiv with the evocative title Reasoning Models Generate Societies of Thought, posits that these models don't merely compute; they implicitly simulate a "multi-agent" interaction. Imagine a boardroom full of experts tossing ideas around, challenging each other's assumptions, and looking at a problem from different angles before finally agreeing on the best answer. That is essentially what is happening inside the code. The researchers found that these models exhibit "perspective diversity," meaning they generate conflicting viewpoints and work to resolve them internally, much like a team of colleagues debating a strategy to find the best path forward.

Microsoft tells you to uninstall the latest Windows 11 update

https://twitter.com/hapico0109/status/2013480169840001437?s=20