Amazon’s Alexa Prize challenges experts to create bot capable of small talk

[Amazon’s Alexa Prize challenges AI experts to create a more natural (presence-evoking) social bot; the original, unabridged version of this long story from Wired describes the background, the competition process and results, and notes that Amazon has another Alexa Prize contest in 2018. –Matthew]


Amazon is in a pitched battle with its rivals to bring truly conversational AI into our homes. So the company is staging a contest—a mad dash toward an outlandish goal: Cook up a bot capable of small talk.

James Vlahos
February 27, 2018

The first interactor—a muscular man in his fifties with a shaved head and a black V-neck sweater—walks into a conference room and sits in a low-slung blue armchair before a phalanx of video cameras and studio lights. He’s brightly lit. The rest of the room is totally dark. He gazes at a black, hockey-­puck-shaped object—an Amazon Echo—on a small table in front of him. “Alexa,” he says, “let’s chat.”

“Good morning, my friend,” a female voice replies with synthetic agreeability, a purplish ring of light pulsing atop the Echo. “Or is it afternoon? I’m not quite sure.”

“It’s morning,” the man replies.

“Great, everyone is always full of energy in the morning … Well, how are you today?”

“I’m great, how are you?”

“Well, I’m always busy, as people keep chatting with me … But listen, this is just hilarious, it’s a recent post on Reddit: ‘Toothpaste is just mouth soap.’ ”

Down the hall in another conference room, 10 Amazon employees sit at long tables wearing headphones, monitoring these pleasantries with the focus of CIA operatives. In yet another room, three men sit in booths cordoned off by black curtains. They, too, wear headphones and have cameras trained on them. Finally, in a control center, members of a video crew monitor all the feeds on a large, tiled screen. Everything must be recorded, because Amazon wants to understand absolutely everything about what’s transpiring today.

This extravagantly staged operation, which took place last November, is the final judging session in a months-long competition. Amazon has challenged 15 teams of some of the world’s best computer science graduate students to build “a socialbot that can converse coherently and engagingly with humans on popular topics for 20 minutes.” If any team succeeds, its members will snare academic glory and the promise of brilliant future careers. (Consider that some of the most impressive alums of the Darpa Grand Challenges, an early set of autonomous vehicle competitions, went on to run the self-driving car divisions of Google, Ford, Uber, and General Motors.) They will also walk away with a $1 million purse—which Amazon has called the Alexa Prize.

Amazon, in case you haven’t noticed, has spent the past few years pursuing voice AI with a voraciousness rivaling that of its conquest of retail. The company has more than 5,000 people working on the Alexa platform. And since just 2015, it has reportedly sold more than 20 million Echoes. One day, Amazon believes, AIs will do much more than merely control lights and playlists. They will drive cars, diagnose diseases, and permeate every niche of our lives. Voice will be the predominant interface, and conversation itself—helpful, informative, companionable, entertaining—will be the ultimate product.

But all this early success and ambition has plunged Amazon off a cliff, and into a wide and treacherous valley. Today Alexa, like all voice assistants, often fails to comprehend the blindingly obvious. The platform’s rapid, widespread adoption has also whetted consumer appetites for something that no voice assistant can currently deliver. Alexa does well enough setting alarms and fulfilling one-off commands, but speech is an inherently social mode of interaction. “People are expecting Alexa to talk to them just like a friend,” says Ashwin Ram, who leads Alexa’s AI research team. Taking part in human conversation—with all its infinite variability, abrupt changes in context, and flashes of connection—is widely recognized as one of the hardest problems in AI, and Amazon has charged into it headlong.

The Alexa Prize is hardly the first contest that has tried to squeeze more humanlike rapport out of the world’s chatbots. Every year for the better part of three decades, a smattering of computer scientists and hobbyists has gathered to compete for something called the Loebner Prize, in which contestants try to trick judges into believing a chatbot is human. That prize has inspired its share of controversy over the years—some AI researchers call it a publicity stunt—along with plenty of wistful, poetic ruminations on what divides humans from machines. But the Alexa Prize is different in a couple of ways. First, the point isn’t to fool anyone that Alexa is a person. Second, the scale of the competition—the sheer human, financial, and computational firepower behind it—is massive. For several months of 2017, during an early phase of the contest, anyone in the US who said “Alexa, let’s chat” to their Amazon voice device was allowed to converse with a randomly selected contest bot; they were then invited to rate the conversation they’d had from one to five stars. The bots had millions of rated interactions, making the Alexa Prize competition, by orders of magnitude, the largest chatbot showdown the world has ever seen.

That showdown culminated last November in a room with a blue armchair and a bunch of lights.

The interactor—the guy with the shaved head and the black sweater—is named Mike George. Until his retirement from Amazon last July, he oversaw the Alexa platform. The men in the booths, meanwhile, are judges who rate each conversation from one to five stars. If a judge thinks that a conversation has gone off the rails, he can press a button on a handheld wand; if a second judge does so, the conversation and the session timer are halted. Nobody knows which bot is which. Not the interactors, not the judges.


The fevered quest for conversational AI has pitted Amazon, Apple, Facebook, Google, and Microsoft in a battle for two vital resources. The first is finite: top-shelf PhDs in computer science, who, owing to their scarcity, now command starting salaries well into the six figures. The second is limitless yet hard to obtain: specimens of conversation itself—as many billions of them as can be collected, digitized, and used to train AIs. Against this backdrop, the Alexa Prize was a masterstroke for Amazon. The contest served as both a talent search for the sharpest graduate students in the world and a chance to pick their brains for a bargain price. And it provided Amazon with an opportunity to amass a conversational data trove that no other technology company has.


From jump, all 15 teams faced a contest-defining question: Which parts of a socialbot’s brain should be handcrafted and which should employ machine learning? Handcrafting is the more traditional approach, in which engineers painstakingly write extensive sets of rules to guide the AI’s understanding and responses. Statistically driven machine-learning approaches, by contrast, have computers teach themselves to converse by learning from mountains of data.

Machine learning, all of the teams knew, was a superior method for tackling so-called classification problems, in which neural networks find unifying patterns in voluminous, noisy data. Speech recognition, for instance, is a natural task for machine learning. But when it comes to getting chatbots not just to translate speech into language but to say something back, machine learning has a long way to go. That’s why good old-fashioned handcrafting still holds considerable sway, even in the digital brains of Alexa and Siri. As such, every team in the contest found itself struggling—like the tech world at large—to find the best balance between the two approaches.


The best conversation in the finals happens with George. While hardly the sparkling repartee of a Tom Stoppard play, it immediately stands out. The chat opens with George asking to talk about robots. Rather than tediously reading a Wikipedia-derived summary of what a robot is, the socialbot jokes, “What if Daft Punk really are two robots who just want to make music?”

“That’s funny,” George says, laughing.

The bot shares some recent news about Elon Musk and robots. Tossing the ball back to George, the bot asks, “Have you heard anything about Elon Musk?”

“I have,” George brags. “I know him.”

“Did you know that Elon Musk was executive producer for the movie Thank You for Smoking?”

“No,” George says, sounding impressed. He and the bot go back and forth before settling on books as their next topic. They take a couple of turns, and then the socialbot says, “You know what I was thinking? Eventually, someone will be the last person to ever be alive. Their name won’t be in any history books.”

“Oh, that is sad!” George replies.

“I’m sorry to make you upset. Would you like to continue our discussion on books?”

“Let’s talk about the book The Dark Tower.”

“Sorry, I’m not sure I know anything about the book The Dark Tower. Would you like to talk about the movie The Dark Tower?”


“Super. Did you watch this movie? It was released this year.”

“I did.”

The bot asks George if he liked Matthew McConaughey’s performance. George did. Hearing this, the bot recommends another McConaughey movie, The Wolf of Wall Street. A couple of turns later, the bot makes a joke. “You know what I was thinking? Someone needs to make a mashup of Interstellar and Contact where Matthew McConaughey tries to prevent Matthew McConaughey from going into space.”

George guffaws.

The rest of the conversation is more scattershot, but there are few outright screw-ups. Music, sports. Ten minutes. The movie The Boondock Saints. Twelve minutes. Santa Claus and his unintended role in climate change. Thirteen minutes. George asks the bot to sing. It complies. Fifteen minutes. Music and movies again, health care and Bill Gates. The timer hits 19 minutes and the conversation is still going.


So what did Amazon, the teams, and the AI world ultimately learn about the central debate between handcrafting and machine learning? UW, the winner, had shot for the middle. The handcrafting-heavy Czech team, meanwhile, had finished second. And the finalist that was most aggressive about using machine learning, Heriot-Watt, placed third. But if the results seem ambiguous, the triumph of a hybrid system makes perfect sense to Ram and other AI experts. We’re just beginning to figure out how best to combine the two approaches, Ram says.

Everyone in the contest also agrees on what would be most helpful to push machine learning forward: more conversational data. That, ultimately, is Amazon’s own contest booty. Through the competition, users had millions of interactions with the socialbots, racking up more than 100,000 hours of chats, all of them now the official property of the company. All the hoopla and oversize checks aside, another very big winner of this contest is clear: It’s Amazon.

This entry was posted in Presence in the News. Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


  • Find Researchers

    Use the links below to find researchers listed alphabetically by the first letter of their last name.

    A | B | C | D | E | F| G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z