“A little too human-like”: Figure 01 is a multi-tasking conversational robot infused with OpenAI tech

[A new video featuring interactions with the AI-powered robot Figure 01 demonstrates technical advancements likely to new levels of medium-as-social-actor presence. It’s getting lots of attention from the public, press and experts in AI and robotics. Details are in stories below from Mashable and Decrypt; the 2:34 minute video is featured in the original versions of both stories and is available on YouTube. –Matthew]

[Image: Source: Decrypt]

It’s like ChatGPT with a body: Watch creepy demo of OpenAI-powered robot ‘Figure 01’

I prefer my ChatGPT disembodied, thank you very much.

By Kimberly Gedeon
March 14, 2024

A creepy demo of “Figure 01,” a humanoid, conversational robot, has hit the internet — and I can’t believe it’s not a deleted scene from I, Robot.

In the demo, Figure 01, packed with OpenAI-tech, is asked what it can “see.” Showing off its visual recognition prowess, the avant-garde robot accurately explains what’s in front of it: a red apple, a drying rack with dishes, and the man who asked Figure 01 the question.

OK, a bit uncanny, but it’s nothing we haven’t seen before, right? For example, last year, Google showed off how the AI model Gemini could recognize stimuli placed in front of it, from a blue rubber duck to various hand-drawn illustrations (though it was later discovered that slick editing slightly exaggerated its capabilities).

But then, the man asks, “Can I have something to eat?” Figure 01 grabs the apple, clearly recognizing that it’s the only edible object on the table, and hands it to him.

Er, are we sure that Will Smith isn’t going to pop up any time soon?

How does the Figure 01 robot work?

What, exactly, is underpinning Figure 01’s seamless interaction with a human? It’s a new Visual Language Model (VLM) transforming Figure 01 from a clunky hunk of junk to a sci-fi-esque, futuristic robot that is a little too human-like. (The VLM stems from a collaboration with OpenAI and Figure, the startup behind Figure 01.)

After handing over the apple, Figure 01 reveals that it can tackle several tasks at the same time when asked, “Can you explain why you [gave me the apple] while you pick up this trash?”

While recognizing what’s trash (and what’s not) and placing the proper items into what Figure 01 identifies as a bin, the robot explains that it offered the man an apple because it was the only thing in front of him that can be eaten. That’s some impressive multitasking!

Finally, the man asks Figure 01 how well it thinks it did. In a conversational manner, the robot says, “I-I think I did pretty well. The apple found its new owner, the trash is gone, and the tableware is right where it belongs.”

According to Brett Adcock, the founder of Figure, Figure 01 has onboard cameras that feed the VLM data that helps it “understand” the scene in front of it, allowing the robot to smoothly interact with the human in front of it. Alongside Adcock, Figure 01 is the brainchild of several key players from Boston Dynamics, Tesla, Google Deep Mind, and Archer Aviation.

Taking a dig at Elon Musk’s Optimus robot, Adcock boasted that Figure 01 is not teleoperated. In other words, unlike Optimus, which went viral for folding a shirt, Figure 01 can operate independently.

Adcock’s ultimate goal? To train a super-advanced AI system to control billions of humanoid robots, potentially revolutionizing multiple industries. Looks like I, Robot is a lot more real than we thought.

[From Decrypt; see the original version for embedded posts from X]

AI Start-Up Figure Shows Off Conversational Robot Infused With OpenAI Tech

Figure introduced a humanoid robot that one engineer said exhibits “common sense,” answering questions and performing tasks simultaneously.

By Jason Nelson
March 13, 2024

[snip]

On Twitter, [the man in the video, Figure’s Senior AI Engineer Corey] Lynch explained the Figure 01 project in more detail. “Our robot can describe its visual experience, plan future actions, reflect on its memory, and explain its reasoning verbally,” he wrote in an extensive thread.

According to Lynch, they feed images from the robot’s cameras and transcribe text from speech captured by onboard microphones to a large multimodal model trained by OpenAI.

Multimodal AI refers to artificial intelligence that can understand and generate different data types, such as text and images.

Lynch emphasized that Figure 01’s behavior was learned, run at normal speed, and not controlled remotely.

“The model processes the entire history of the conversation, including past images, to come up with language responses, which are spoken back to the human via text-to-speech,” Lynch said. “The same model is responsible for deciding which learned, closed-loop behavior to run on the robot to fulfill a given command, loading particular neural network weights onto the GPU and executing a policy.”

Lynch explained that Figure 01 is designed to describe its surroundings concisely, and can apply “common sense” for decisions, like inferring dishes will be placed in a rack. It can also parse vague statements, such as hunger, into actions, like offering an apple, all the while explaining its actions.

The debut sparked a passionate response on Twitter, many people impressed with the capabilities of Figure 01—and more than a few adding it to the list of mileposts on the way to the singularity.

Please tell me your team has watched every Terminator movie,” one replied.

“We gotta find John Connor as soon as possible,” another added.

For AI developers and researchers, Lynch provided a number of technical details.

“All behaviors are driven by neural network visuomotor transformer policies, mapping pixels directly to actions,” Lynch said. “These networks take in onboard images at 10hz and generate 24-DOF actions (wrist poses and finger joint angles) at 200hz.”

Figure 01’s impactful debut comes as policymakers and global leaders attempt to grapple with the proliferation of AI tools into the mainstream. While most of the discussion has been around large language models like OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude AI, developers are also looking for ways to give AI physical humanoid robotic bodies.

Figure AI and OpenAI did not immediately respond to Decrypt’s request for comment.

“One is a sort of utilitarian objective, which is what Elon Musk and others are striving for,” UC Berkeley Industrial Engineering Professor Ken Goldberg previously told Decrypt. “A lot of the work that’s going on right now—why people are investing in these companies like Figure—is that the hope is that these things can do work and be compatible,” he said, particularly in the realm of space exploration.

Along with Figure, others working to merge AI with robotics is Hanson Robotics, who in 2016 debuted its Desdemona AI robot.

“Even just a few years ago, I would have thought having a full conversation with a humanoid robot while it plans and carries out its own fully learned behaviors would be something we would have to wait decades to see,” Lynch said on Twitter. “Obviously, a lot has changed.”

This entry was posted in Presence in the News. Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*
*

  • Find Researchers

    Use the links below to find researchers listed alphabetically by the first letter of their last name.

    A | B | C | D | E | F| G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z