Promise and peril: Sonantic’s emotional AI voices

[This short piece from PC Gamer nicely captures both prospective benefits and significant concerns about a future saturated with presence experiences, focused in this case on a new AI-driven text-to-speech technology that creates compelling illusions. The original story includes the 4:25 minute video it describes, and the video is also available on YouTube and on Sonantic’s website. More information is in stories from Yahoo! News and The Times (the latter behind a subscriber firewall). –Matthew]

Oh good, someone invented the ‘first AI capable of crying’

A company has created a text-to-speech technology that doesn’t just read words, it simulates an acting performance.

By Christopher Livingston
May 14, 2020

Text-to-speech is a prolific and extremely useful technology, but it doesn’t burst into tears often enough while reading to you, does it? There’s a fix for that from Sonantic, a company that claims to have invented “the world’s first AI capable of crying.” Finally! AI can be just as sad as we all are.

Okay, the AI isn’t actually sad, it’s engaging in a text-to-speech process that doesn’t just read the words you give it, but simulates the emotion of an acting performance. I guess in an age of CGI and Deepfakes, computer-simulated voice acting was next troubling item on the list.

“The aim of the company is to really capture this deep emotion using machine learning,” says Felix Vaughhan, deep learning researcher at Sonantic. “And the first thing we focused on was sadness.”

You can see it for yourself in the video [in the original story]. According to Sonantic, the voices of the mother and daughter in the video “are entirely computer generated.” Check it out. It’s pretty wild.

The video also includes some of Sonantic’s creators, who are unfortunately pretty darn cagey about explaining how it all works. The process does involve real human actors who help build Sonantic’s artificial voices, one of whom is also shown in the video. Actors who partner with Sonantic “can earn passive income when clients around the world use their synthetic voice within commercially released projects,” according to the website.

Users, meanwhile, will be able to import a script, choose from a selection of “voice models” to perform the dialogue, and swap between different voices with “just a few clicks.” You’ll be able to “direct” the AI by adjusting its performance for more or less emotion, projection, pacing, and other tweakable settings.

While the technology does seem pretty neat, there’s also something kinda icky about it, because this isn’t how acting or directing works. Actors aren’t a bundle of sliders and knobs, and directing a performance isn’t done by tweaking a few settings. I definitely understand the appeal for game developers to be able to change a few lines of dialogue at the last minute or adjust the tone of a performance, but acting is, y’know, an art form. It’s weird to see it boiled down to assigning a number to the ‘Emotion’ meter on a website.

But this is our weird, troubling future and it’s clear we’re all going to be replaced by computers eventually. Maybe a computer is writing this article. Maybe a computer is reading it, too. There’s no way to tell anymore.

This entry was posted in Presence in the News. Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


  • Find Researchers

    Use the links below to find researchers listed alphabetically by the first letter of their last name.

    A | B | C | D | E | F| G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z