Embodied AI: Big tech is developing powerful AI brains for real-world robots

[Not only are researchers linking conversational AI to digital avatars (as in an ISPR Presence News post last week), making them more likely to evoke medium-as-social-actor presence, they’re using AI to make embodied robots more capable of human-like behavior and interactions. Here are highlights of a March 2023 VICE story titled “Big tech is now developing powerful AI brains for real-world robots. Building on recent AI advancements to allow robots to complete tasks autonomously in the real world is a ‘major step forward,’ researchers say.”:

Researchers at Google and the Berlin Institute of Technology have released an AI model called PaLM-E this week that combines language and vision capabilities to control robots, allowing them to complete tasks autonomously in the real world… According to the researchers, this is the largest Visual Language Model (VLM) reported to date, with 562 billion parameters. This AI has a ‘wide array of capabilities’ which includes math reasoning, multi-image reasoning, and chain-of-thought reasoning. The researchers wrote in a paper that the AI uses multi-task training to transfer skills across tasks, rather than being trained on individual tasks. […]

PaLM-E is based on Google’s previous large language model called PaLM and the E in the name stands for ’embodied,’ and refers to the model’s interaction with physical objects and robotic control. PaLM-E is also built off of Google’s RT-1, a model that processes robot inputs and outputs actions such as camera images, task instructions, and motor commands. The AI uses ViT-22B, a vision transformer model that does tasks such as image classification, object detection, and image captioning.

The robot is able to generate its own plan of action in response to commands using the model. When the robot was asked to ‘bring me the rice chips from the drawer,’ PaLM-E was able to guide it to go to the drawers, open the top drawer, take the rice chips out of the drawer, bring it to the user, and put it down. The robot was able to do this even with a human disturbance, with a researcher knocking the rice chips back into the drawer the first time the robot picked it up. PaLM-E is able to do this by analyzing data from its live camera.

‘PaLM-E generates high-level instructions as text; in doing so, the model is able to naturally condition upon its own predictions and directly leverage the world knowledge embedded in its parameters,’ the researchers wrote. ‘This enables not only embodied reasoning but also question answering, as demonstrated in our experiments.’ […]

Google is not the only company testing out a new multimodal AI and how to incorporate large language models in robots. Microsoft released its research on how it extended the capabilities of ChatGPT to robotics.“

The story from The New York Times below considers some of the larger questions regarding how best to link a robot’s mind and body. See the original version for seven more images. –Matthew]

[Image: Embodied, a start-up based in Pasadena, Calif., has designed what the company calls “the world’s first A.I. robot friend.” Credit: Alex Welsh for The New York Times]

Can Intelligence Be Separated From the Body?

Some researchers question whether A.I. can be truly intelligent without a body to interact with and learn from the physical world.

By Oliver Whang
April 11, 2023; Updated April 20, 2023

What is the relationship between mind and body?

Maybe the mind is like a video game controller, moving the body around the world, taking it on joy rides. Or maybe the body manipulates the mind with hunger, sleepiness and anxiety, something like a river steering a canoe. Is the mind like electromagnetic waves, flickering in and out of our light-bulb bodies? Or is the mind a car on the road? A ghost in the machine?

Maybe no metaphor will ever quite fit because there is no distinction between mind and body: There is just experience, or some kind of physical process, a gestalt.

These questions, agonized over by philosophers for centuries, are gaining new urgency as sophisticated machines with artificial intelligence begin to infiltrate society. Chatbots like OpenAI’s GPT-4 and Google’s Bard have minds, in some sense: Trained on vast troves of human language, they have learned how to generate novel combinations of text, images and even videos. When primed in the right way, they can express desires, beliefs, hopes, intentions, love. They can speak of introspection and doubt, self-confidence and regret.

But some A.I. researchers say that the technology won’t reach true intelligence, or true understanding of the world, until it is paired with a body that can perceive, react to and feel around its environment. For them, talk of disembodied intelligent minds is misguided — even dangerous. A.I. that is unable to explore the world and learn its limits, in the ways that children figure out what they can and can’t do, could make life-threatening mistakes and pursue its goals at the risk of human welfare.

“The body, in a very simple way, is the foundation for intelligent and cautious action,” said Joshua Bongard, a roboticist at the University of Vermont. “As far as I can see, this is the only path to safe A.I.”

At a lab in Pasadena, Calif., a small team of engineers has spent the past few years developing one of the first pairings of a large language model with a body: a turquoise robot named Moxie. About the size of a toddler, Moxie has a teardrop-shaped head, soft hands and alacritous green eyes. Inside its hard plastic body is a computer processor that runs the same kind of software as ChatGPT and GPT-4. Moxie’s makers, part of a start-up called Embodied, describe the device as “the world’s first A.I. robot friend.”

The bot was conceived, in 2017, to help children with developmental disorders practice emotional awareness and communication skills. When someone speaks to Moxie, its processor converts the sound into text and feeds the text into a large language model, which in turn generates a verbal and physical response. Moxie’s eyes can move to console you for the loss of your dog, and it can smile to pump you up for school. The robot also has sensors that take in visual cues and respond to your body language, mimicking and learning from the behavior of people around it.

“It’s almost like this wireless communication between humans,” said Paolo Pirjanian, a roboticist and the founder of Embodied. “You literally start feeling it in your body.” Over time, he said, the robot gets better at this kind of give and take, like a friend getting to know you.

Researchers at Alphabet, Google’s parent company, have taken a similar approach to integrating large language models with physical machines. In March, the company announced the success of a robot they called PaLM-E, which was able to absorb visual features of its environment and information about its own body position and translate it all into natural language. This allowed the robot to represent where it was in space relative to other things and eventually open a drawer and pick up a bag of chips.

Robots of this kind, experts say, will be able to perform basic tasks without special programming. They could ostensibly pour you a glass of Coke, make you lunch or pick you up from the floor after a bad tumble, all in response to a series of simple commands.

But many researchers doubt that the machines’ minds, when structured in this modular way, will ever be truly connected to the physical world — and, therefore, will never be able to display crucial aspects of human intelligence.

Boyuan Chen, a roboticist at Duke University who is working on developing intelligent robots, pointed out that the human mind — or any other animal mind, for that matter — is inextricable from the body’s actions in and reactions to the real world, shaped over millions of years of evolution. Human babies learn to pick up objects long before they learn language.

The artificially intelligent robot’s mind, in contrast, was built entirely on language, and often makes common-sense errors that stem from training procedures. It lacks a deeper connection between the physical and theoretical, Dr. Chen said. “I believe that intelligence can’t be born without having the perspective of physical embodiments.”

Dr. Bongard, of the University of Vermont, agreed. Over the past few decades, he has developed small robots made of frog cells, called xenobots, that can complete basic tasks and move around their environment. Although xenobots look much less impressive than chatbots that can write original haikus, they might actually be closer to the kind of intelligence we care about.

“Slapping a body onto a brain, that’s not embodied intelligence,” Dr. Bongard said. “It has to push against the world and observe the world pushing back.”

He also believes that attempts to ground artificial intelligence in the physical world are safer than alternative research projects.

Some experts, including Dr. Pirjanian, recently conveyed concern in a letter about the possibility of creating A.I. that could disinterestedly steamroll humans in the pursuit of some goal (like efficiently producing paper clips), or that could be harnessed for nefarious purposes (like disinformation campaigns). The letter called for a temporary pause in the training of models more powerful than GPT-4.

Dr. Pirjanian noted that his own robot could be seen as a dangerous technology in this regard: “Imagine if you had a trusted companion robot that feels like part of the family, but is subtly brainwashing you,” he said. To prevent this, his team of engineers trained another program to monitor Moxie’s behavior and flag or prevent anything potentially harmful or confusing.

But any kind of guardrail to protect against these dangers will be difficult to build into large language models, especially as they grow more powerful. While many, like GPT-4, are trained with human feedback, which imbues them with certain limitations, the method can’t account for every scenario, so the guardrails can be bypassed.

Dr. Bongard, as well as a number of other scientists in the field, thought that the letter calling for a pause in research could bring about uninformed alarmism. But he is concerned about the dangers of our ever improving technology and believes that the only way to suffuse embodied A.I. with a robust understanding of its own limitations is to rely on the constant trial and error of moving around in the real world.

Start with simple robots, he said, “and as they demonstrate that they can do stuff safely, then you let them have more arms, more legs, give them more tools.”

And maybe, with the help of a body, a real artificial mind will emerge.


Comments


Leave a Reply

Your email address will not be published. Required fields are marked *

ISPR Presence News

Search ISPR Presence News:



Archives