Industry veteran on requirements for successful telepresence in virtual conferences and beyond

[This post in Medium provides an industry veteran’s thoughtful perspective on the requirements for successful telepresence experiences that replace attendance at in-person conferences and in other contexts. See the original version for a different image and follow the links for more information. –Matthew]

[Image: Source: Liv Erickson blog]

Coronavirus Exposes Our Need For True Telepresence

Let’s meet nowhere

Avi Bar-Zeev, Design and Technology Leader (fmr. HoloLens, Apple, Google Earth, Second Life, Disney VR)
March 2, 2020

I’ve been convinced for 30 years that telepresence is the “killer app” for extended reality (XR). Now more than ever, it seems like it could be an important replacement for face-to-face meetings and conferences.

This same idea led to the first HoloLens prototypes and a visually rich TED Talk by my former boss, Microsoft technical fellow Alex Kipman. The first two augmented reality (AR) devices Microsoft has released aren’t yet great for representing “live” holograms of people, mainly due to their narrow fields of view. Unless a person fits entirely within this field of view, we perceive them being virtually “vivisected” by the device, which can be disconcerting. Still, the tech is moving in a direction that will let us someday see remote, whole people sitting across the table, seamlessly blended in with the real world.

Humans are deeply social. Most are reasonably empathic. All of us have a need for positive human attention. Anything resembling an XR future has to embrace these qualities to the core.

As I write this in March 2020, many of us are reeling from the cancelation or postponement of key tech conferences due to the coronavirus. Some people have argued that meeting in VR would resolve the current dilemma. Why wouldn’t entirely XR conferences work well in the world they’re promoting?

The discussion reminds me of a similar one in 2007, when people advocated for a hybrid of Google Earth and Second Life. I blogged at the time about why this fusion would be a terrible idea, mainly because what would we all do together on a mostly empty, movie-set-like copy of the world?

Many companies have tried to crack this nut. Second Life has been around for about 20 years now, with a viable business plan that monetizes land (aka our mutual need to be relatively near each other). The VR-plus-blockchain startup High Fidelity (founded by Second Life founder Philip Rosedale) started over with a more scalable tech stack and has since pivoted to audio-only collaboration. Other “VR chat” companies have been purchased along the way — or died trying.

Why is telepresence tech so hard?

Here are the top requirements for success in telepresence, based on real experiences:

  1. Co-presence is the perception that we truly inhabit a space together. You know you’ve achieved this when you take off your VR headset and genuinely wonder, “Where did everybody go?”
  2. Emotional fidelity is the ability to emote naturally and correctly perceive the emotions of others (as seen via natural eye contact and joint attention, spatial audio voice, and faithful body representation and animation).
  3. Embodiment is the sense that we are each real, and we manifest a physical form that can have impact and be impacted by events around us.
  4. Persistence is the degree to which these effects combine over time instead of periodically resetting.

Consider these qualities as they’re represented in Twitter or Reddit, even if your literal embodiment there is just a “profile” and text label instead of some rich 3D representation.

These websites act a lot like town squares. They let us meet and speak our minds or just share cat GIFs. There is definitely strong persistence on both sites. When we spend so much time building up a real history, followers, and whatnot, most of us are not likely to act antisocially. If we do get ourselves banned for behavior, it’s more painful to start over when we do. And if we used our real identities, the effects of good or bad behavior will possibly spill over into real life. Trolls, on the other hand, rarely make the same level of investment.

But there’s a distinct lack of co-presence on these websites. We call these both “sites,” but our brains don’t really believe these spaces actually surround us when we “visit.” Twitter and Reddit just don’t have the sense of “being there” that high-quality VR with true presence might provide. There is no way to explain VR presence without experiencing it, except perhaps by understanding your current experience of being somewhere. Try briefly imagining that your place isn’t real, and ask why you know that it is.

As for emotional fidelity on these websites, we receive emotional cues mainly through text, augmented with a few emoji and visual memes. It takes a talented writer (poet, author) to seed their text such that others will experience the intended emotions. It’s a bit like writing code to run on other people’s brains, repeatedly and reliably. Most of us fail at it.

But even with those deficits of true telepresence, Twitter and Reddit are far more successful than any VR chat experience has been thus far. XR meetings and conferences will need to carry the same attractive qualities, while also taking the experience to a whole new level.

Perhaps when we imagine conference-like presentations in VR, they look like live YouTube streams, except we can visualize and interact with an audience better than with a list of unembodied comments. (Here, the bar to exceed is quite low.)

Even when YouTube says a million people have watched a popular video, we never really feel that in the same way we might feel a crowd at a concert or sporting event. As an introvert, I don’t often seek that kind of social density. But I can easily imagine extroverts getting very “charged up” by larger crowds.

There is a quality of real-life conferences that even the most introverted of us will still tend to seek — face-to-face connection with people we may only ever see at these meetings. There are also more people with whom we might make real connections through random interactions at these conferences.

For example, when I go to the AWE conference every May/June, I rarely make it to the scheduled talks. The hallway or side conversations usually take up most of my time. The highlights for me are the dinners and occasional parties with industry veterans. That’s where the deep conference happens.

Even for first-timers, the awe of just being in the same space as those veterans offers validation. I also feel that when I go to science fiction writing conventions, where I’m unknown but can casually meet great authors.

Solving for virtual conferences means solving telepresence in general. How do we make this at least as good as face-to-face conversations, starting one on one and then bridging out to small ad hoc groups?

How do we add the natural serendipity of meeting people we don’t know and having those starter conversations that lead to lasting relationships?

Mozilla Hubs PM Liv Erickson has some great thoughts on how we might use XR for replacing live conferences, or not, in light of the coronavirus outbreak. Take a moment to take in her excellent insights.

The technical issues Erickson raises are very real. I’ve helped prototype schemes to handle thousands, up to maybe millions, of proximate users by distributing the back-end work and refining the interaction models. This is several orders of magnitude harder than scaling Twitter or Facebook to the same size, in my opinion.

But more important, for whatever reason, most of us can’t even seem to get our microphones configured correctly for mere audio calls on our computers. Every one of my attempts to use current VR telepresence has required up to an hour of prep time to make it “seamless.”

In real life, this stuff needs to “just work,” the way phone calls used to just work, but ideally with better user flow than waiting for a remote person to answer a ringing line. I’ve spent some time thinking through that UX flow in particular, with some results I’ll share down the road.

For today, Mozilla Hubs is a good starting point for simplicity in creating VR meeting spaces and basic 3D interactions. When talking to co-workers you know fairly well, it’s already better than talking on the phone (especially group conference calls), because the spatial audio quality is decent. Our brains are wired for spatial audio; there’s a phenomenon called the “cocktail party effect” in which we naturally tune out background noise and focus on the person in front of us. Hubs is better than video chat in some ways, since you can see realistic head direction (not eye contact yet) among groups when wearing VR headsets. But it would not yet be great for management one-on-ones or seeing family, because it’s still missing many of the rich interpersonal emotional qualities we mentioned earlier.

The Mozilla Hubs team added the important ability to share 2D desktop windows and screens in VR, which should make it possible to perform standard PowerPoint or Keynote presentations live. The update rate for this is modest, so video and animations may not work as well. Keynote and PowerPoint also tend to take over your screens when presenting, so having a second computer may be best unless you can simplify to showing PDFs or moving through your presentation in the normal “edit” mode.

It will take some time to work through all the issues to make a simple group presentation work flawlessly in XR, including, at some point, new native XR presentation tools.

There are several other reasons XR tech conferences aren’t quite ready for prime time. One is that it’s already pretty hard today to sell you a television set that’s twice as good as your current, more limited set. How can we show you the difference, except by sleight of hand? Selling you a next-gen XR experience usually requires next-gen hardware. Seeing that in person is really the only way to really appreciate the differences. Nvidia may have some advantages with its new all-digital conference, since it can do its 3D rendering in the cloud.

Finally, and most perplexingly, there is something to be said for the relative scarcity of conferences as part of their appeal. Twitter, Reddit, and YouTube are constantly available. Special time-limited events may actually focus and enhance their appeal because they get us all to show up at the same time and place. But in XR, this limitation is arbitrary. Without travel and production costs for booths and more, why not have a yearlong XR conference with new content coming daily? Would that be better? I don’t know. But it may be inevitable.

In the meantime, we still largely live and meet in the real world as we push closer to the XR ideal.

Update: Here are some additional services you may want to try if you’re exploring the current state of telepresence for XR:

  1. RoadToVR — 26 VR App for Remote Work
  2. Next.Reality-News — 10 Remote Collaboration Apps for XR
This entry was posted in Presence in the News. Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


  • Find Researchers

    Use the links below to find researchers listed alphabetically by the first letter of their last name.

    A | B | C | D | E | F| G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z