Oculus wants to help VR avatars look normal when they talk

[This hasn’t gotten a lot of press coverage, but could be an important step toward more effective presence illusions. The story is from Engadget, which features a 0:13 minute demo video using the avatar pictured below; a second video using an animated robot is available on YouTube. More details from Oculus via Geeky Gadgets are included below. –Matthew]

OVRLipSync Oculus Unity plugin (avatar)

[Image: From Geeky Gadgets]

Oculus wants to help VR avatars look normal when they talk

It’s all thanks to a clever Unity plugin

Chris Velazco

Remember all those Hong Kong kung-fu movies with really poor dubbing so the actors’ mouths would keep flapping after the words had stopped? That was charming. What’s less charming is the possibility of stone-faced avatars poorly mouthing dialogue, detracting ever so slightly from the immersive power of virtual reality worlds. That’s why we’re all slightly excited that Oculus released a beta Unity plugin called OVRLipSync.

The plugin lets developers sync an avatar’s mouth movements to either existing audio or input from a microphone without too much hassle. Granted, the results aren’t wholly life-like, but it’s not a bad showing for some brand new software. More importantly, we’re left wondering how many new VR titles will up taking advantage of this thing. Our guess? Lots. Its potential importance stretches beyond just making NPCs look more natural, too. Oculus is working on shared VR experiences with Oculus Social, so maybe we’ll get those ornate virtual chatrooms with fully animated avatars that were promised in cyberpunk novels after all.

[Geeky Gadgets includes more details from the Oculus documentation: ]

OVRLipSync is an add-on plugin and set of scripts used to sync avatar lip movements to speech sounds from a canned source or microphone input. OVRLipSync requires Unity 5.x Professional or Personal or later, targeting Android or Windows platforms, running on Windows 7, 8, or 10 or 8. OS X 10.9 and later are also currently supported.

Our system currently maps to 15 separate viseme targets: sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, and ou. These visemes correspond to expressions typically made by people producing the speech sound by which they’re referred, e.g., the viseme sil corresponds to a silent/neutral expression, PP appears to be pronouncing the first syllable in “popcorn,” FF the first syllable of “fish,” and so forth.

This entry was posted in Presence in the News. Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


  • Find Researchers

    Use the links below to find researchers listed alphabetically by the first letter of their last name.

    A | B | C | D | E | F| G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z