[Researchers at the University of California, Davis are enhancing social presence, communication and learning in virtual settings by giving AI-based agents the ability to better mimic how humans use gestures. See the original version of this story for three more images. For more context, see a report in The Conversation (coincidentally published on the same day) titled “People who talk with their hands seem more clear and persuasive – new research.” –Matthew]

[Image: Ozgur Demir, a Ph.D. student in computer science and researcher in the UC Davis Motion Lab, left, and Professor Michael Neff, simulate interacting in the virtual space, with Neff as the instructor and Demir as the student, to gather data on how best to build virtual instructors. Credit: Mario Rodriguez/UC Davis]
Bringing Human Communication to Virtual Teaching
By Jessica Heath
December 4, 2025
Imagine a virtual space in which an AI instructor effectively guides students through an interactive lesson plan, like building an electronic circuit. Students in remote locations or who need extra stimulation can receive instruction from a virtual agent that can meaningfully communicate with them using verbal and nonverbal methods, catering to multiple learning styles and enhancing academic opportunities.
This scenario is what Michael Neff, a professor of computer science at the University of California, Davis, could help bring to reality with his research, which lies at the intersection of computation and movement. In the Motion Lab, Neff investigates topics like character animation technology, gestures and nonverbal communication, and virtual reality, to name a few.
As an early researcher in the field of mapping gestures to audio data, Neff is witnessing its evolution and studying how developing realistic AI agents could strengthen educational experiences in the classroom and beyond.
Engineering a Virtual Classroom
Teaching in a virtual reality, or VR, environment came to Neff via Lee Martin, a UC Davis professor of education who works on maker education. Maker education is an exploratory learning process in which students are given parts to build with a prompt like, “Can you make a circuit that turns a light on?” The “teacher” acts more as a facilitator, guiding students when they get stuck.
During the pandemic, when in-person gatherings were limited, however, doing this sort of education on a platform like Zoom or YouTube, where the facilitator and the students are in separate physical spaces and interacting in a two-dimensional video space, wasn’t working well.
“Video offerings and online learning like Khan Academy have become a real asset in learning spaces, and they’ve been very effective,” Neff said. “In this maker education setting, you don’t have both people in a shared space, which makes it more difficult.”
Neff, Martin and Joshua McCoy, an associate professor of computer science whose research focuses on game technology and artificial intelligence and design, considered whether they could build a classroom environment for maker education in a virtual reality space. So, they did.
The researchers created a VR environment prototype that allows students — the target age is 13 to 17 — to build simulated electronic circuits with a virtual tutor with whom they can troubleshoot problems. Currently, the sessions are being conducted with a live tutor that is projected into VR using motion capture, and the researchers are collecting data.
Eventually, Neff says, the goal is to use the gathered data — from where the students are looking to the location of the objects to the motion capture of the facilitator — to create learning modules where an AI-driven tutor is providing advice and guidance. If the AI agent can conduct realistic nonverbal communication, it will greatly enhance the students’ learning.
“This data should help us build autonomous versions of these agents in limited domains,” Neff said, “and help us advance, more generally, nonverbal models, because we’re able to build models that take this special context into account.”
Laying the Groundwork for Intelligent Animation
Before the field of creating more lifelike embodied AI exploded in 2018, Neff was one of the few people working on mapping gestures from audio. Neff’s early work aimed to redesign software tools to simplify character animation by incorporating concepts that enhance the expressiveness of motion into those tools.
His work with fellow researcher Michael Kipp, whom he met while finishing up his postdoctoral placement at the Max Planck Institute in Germany, delved into the gesture modeling space.
Soon after they started working together, they published one of the first statistical models for generating non-verbal behavior using video content of two talk show hosts — Jay Leno and Marcel Reich-Ranicki from Germany — and annotating the video for the gestures the hosts were performing.
“We used video of them talking and gesturing, and then we annotated that video for the gestures they were performing,” Neff said. “There’s some correlation between the gesture and the speech. So, we basically built a statistical model that allowed us to build a probabilistic map between what they were saying and what they were doing.”
The researchers then gave the model new text. The model would then synthesize new motion based on that text.
“We used Star Wars sentence examples, like ‘They’re working on building the Death Star,’” Neff said. “The model analyzes that text and realizes that the Death Star is an object. It finds where Jay Leno talked about objects and what kind of gestures he used for objects. It does statistical matching.”
From Audio to Action
For that early work, the only input was text. In the work he’s doing now, enhancing the maker education space, Neff and his team are replacing these simple probabilistic models with more complicated deep learning versions, and instead of using just text, also mapping from audio and text to gestures.
Audio, Neff said, can provide a lot of information for mapping. How someone varies their voice or uses emphasis is typically a good indicator of when they will gesture.
In one of his projects, Neff is trying to model representational space, referring to the way people gesture when they are explaining locations, time or entities, to capture more of the semantics of what is going on. For example, someone might discuss Republicans and Democrats, placing each group in a different location and gesturing back and forth between the two.
“When people are gesturing, they set up an abstract representational space in front of them. They’ll say, ‘There’s what we did today,’ [and place today in front of them] and then, ‘There’s what we did yesterday [and place yesterday to the side].’ They build up this space, refer back to it, put objects in locations and refer back to those locations. So, for this project, we’re looking at things like, ‘What is the locational relationship between concepts?’”
This is important because humans have evolved with nonverbal communication, and there is strong evidence that different information is perceived from gestures. Gestures can even be more effective, in some cases, than verbal communication, particularly when conveying emotions.
The Next Wave of Embodied Intelligence
With the advent of deep learning models, Neff’s field has exploded. Industry powerhouses like Nvidia are on a continuous quest for graphics with more human-like movement, developing foundational models for nonverbal behavior and for mapping images to language and language to images.
Neff believes this happened for a few reasons. The models have become better and faster, making this type of work more attainable. Additionally, with the vast amount of audio and video content available, the datasets for training are becoming more accessible.
For Neff, the fact that models are now much more capable means that the next steps in incorporating AI into virtual education opportunities are imminent.
“We have voice-based bots like Siri and Alexa. The next step is: Can you actually have these things be embodied? They could be embodied in robots or in virtual characters for a range of applications, from personal assistants to games to tutoring systems. That embodiment could bring so much additional value.”
Leave a Reply