[In an interview with The Verge’s Decoder podcast, the CEO of Zoom describes a future in which users send AI-based digital versions of themselves, indistinguishable from the users except with authentication software, to Zoom meetings. I’ve included the relevant portions of the interview transcript below, but you can listen to the full 1 hour and 5 minute interview and/or read the full transcript at the original story. –Matthew]
[Image: Credit: Photo illustration by The Verge / Photo: Zoom]
The CEO of Zoom wants AI clones in meetings
Zoom founder Eric Yuan has big ambitions in enterprise software, including letting your AI-powered ‘digital twins’ attend meetings for you.
By Nilay Patel, editor-in-chief of the Verge, host of the Decoder podcast, and co-host of The Vergecast.
Jun 3, 2024
Today, I’m talking with Zoom CEO Eric Yuan — and let me tell you: this conversation is nothing like what I expected. Eric started Zoom after working at Cisco and realizing there was an opportunity to make videoconferencing simpler and easier to use. And he was right: Zoom is now a household name — especially after usage exploded during the pandemic.
But usage has since come down, and Zoom faces a number of business challenges he and I talked about. Yet, it turns out, Eric wants Zoom to be much, much more than just a video chat platform. He wants to take on Microsoft and Google in the enterprise software market by making docs and email and other productivity tools like chat. And like virtually every other company, Zoom now has a big investment in AI — and Eric’s visions for what that AI will do are pretty wild.
See, Eric really wants you to stop having to attend Zoom meetings yourself. You’ll hear him describe how he thinks one of the big benefits of AI at work will be letting us all create something he calls a “digital twin” — essentially a deepfake avatar of yourself that can go to Zoom meetings on your behalf and even make decisions for you while you spend your time on more important things, like your family.
I’ll just warn you: I tried to ask a bunch of the usual Decoder questions during this conversation, but once we got to digital twins going to Zoom meetings for people, I had a lot of follow-up questions. How many digital twins might you have? How will they all stay in sync? Can you trust them? What work will be left if everyone is sending their digital twins to all the meetings?
Eric was more than game to talk about these ideas with me, and this became a very different kind of CEO interview on Decoder. I haven’t stopped thinking about it since we recorded it. I think you’re going to like it.
Okay, Eric Yuan, founder and CEO of Zoom. Here we go.
This transcript has been lightly edited for length and clarity.
[snip]
Let’s start at the very beginning. Everyone knows Zoom as a videoconferencing app. You’ve just released a bunch of new features. You have workplace features. You have AI features. How do you think about Zoom right now?
I think, for now, we are embarking on a 2.0 journey. You are right on. Looking back at 1.0, it was more about building some applications; videoconferencing is one of them. Our slogan was “Work Happy.” Right now, [when] you look at a 2.0, it is different. It’s “Work Happy with the Zoom AI Companion” and everything really about Workplace, the entire collaboration platform as well as AI.
[snip]
Let’s say the team is waiting for the CEO to make a decision or maybe some meaningful conversation, my digital twin really can represent me and also can be part of the decision making process. We’re not there yet, but that’s a reason why there’s limitations in today’s LLMs. Everyone shares the same LLM. It doesn’t make any sense. I should have my own LLM — Eric’s LLM, Nilay’s LLM. All of us, we will have our own LLM. Essentially, that’s the foundation for the digital twin. Then I can count on my digital twin. Sometimes I want to join, so I join. If I do not want to join, I can send a digital twin to join. That’s the future.
How far away from that future do you think we are?
I think in a few years, we’ll get there, but we’re just at the beginning. The reason why is because of two problems. The first problem is today, look at the large language model itself — it just started. A lot of potential opportunities, but it’s not there yet. Another thing is we have to make sure you have a customized version. Essentially, [for] every human being, you have to have your own version of LLM based on all the data, based on all the context around you. So you have your LLM; I have my LLM. I might have multiple versions of LLM. Sometimes I know I’m not good at negotiations. Sometimes I don’t join a sales call with customers. I know my weakness before sending a digital version of myself. I know that weakness. I can modify the parameter a little bit.
You think you would have a dial be like “be a better salesperson”?
Exactly. For that meeting I say, “Hey, tune that parameter to have better negotiation skills, send that version, and join.”
When you think about this as expressed in Zoom, the videoconferencing app, do you think there would be a 3D avatar of you, like the Vision Pro faces that Apple is doing, or do you think it would just be a voice?
To start, it’ll probably be voice, but for sure, down the road, the experience would be immersive, like with Vision Pro and Meta Quest 3. I think again, this is also the beginning, but the experience down the road, that’s a 3D version of yourself that can mimic you very well, so you can’t know if it’s a real person or just a 3D version.
This is a lot of stacked up technology problems to solve, right? There’s a realistic 3D avatar. There’s an LLM that you might be able to tune with different parameters that you can trust. I think a lot of people don’t trust LLMs today. They hallucinate a lot. There’s everybody in the world being culturally with talking to a digital avatar. That’s a lot of problems. How is Zoom organized to solve those problems and get to this vision today?
Even a few years ago, we talked about the vision at Zoomtopia, which is our user conference. Imagine a world where you and I live in Silicon Valley. I live in San Jose; you are in San Francisco. We may not be in the same place. Whenever you and I have a call down the road, it’ll feel like you and I are sitting together. I shake your hand, and you feel my hand. I give you a hug, and you feel my intimacy as well. Plus, even two people who speak a different language, the real-time translation will also work extremely well. And if you and I don’t want to meet, I send a digital version for myself, and you’ll have exactly the same conversation. I think that’s the vision we painted a few years ago.
But how to get there? I think two things. First of all, luckily, I think we’ve already started. Look at the industry. I think there are two technologies that are going to help us to start that. One is AI — another is AR. Vision Pro, the Meta Quest 3 — it’s just starting. Look at today and all the generative AI [products]; it’s just started. I do not think those technologies are ready yet, but they will help us get there.
[snip]
Do you think it will be possible to reliably detect digital twins or deepfakes?
Of course. Because, again, it’s more like: Let’s say I send a digital twin of myself. It will be authenticated. You will say this is the real digital twin of Eric or the digital twin of somebody else, given the critical technology and a lot of new stuff. I think it’s very feasible to detect. Otherwise, you send a digital twin, and it may not be myself — you’re meeting somebody else.
Let me ask you this: what do you think is the limit? If I have a digital twin of me that can go to Zoom meetings or appear at conferences, should I be able to send a hundred digital twins out in the world? A thousand? Is there a limit? Do you think there should be a limit?
I don’t think there’s a limit. It really depends on yourself. More like how today, [if] you join a meeting, you want to wear a black jacket or you want to wear a white jacket. It depends on the day. Again, they all belong to you, though, right? You have multiple versions of digital twin. Some versions, I just want one—
Will all the digital twins be connected to each other? So if I have a hundred digital twins of me out in the world and one is being asked if the next car I want should be red or blue and another one is asking me if the next car I want should be white or black and they answer different colors, how will they know?
So, again, first of all, you control all your digital twins.
Sure.
That’s one thing. The second thing: your digital twins, multiple digital twins, are different based on your training. One digital twin is really more like a sales expert; another digital twin of yourself is more like an engineering expert. Again, you manage that. Whenever you send a digital twin of yourself to join any other meetings, any other digital context, we know that they’ll be authentic given AI-based authentication. They know that it’s one of the digital twins of Nilay or one of the digital twins of Eric. They know that, too.
Do you imagine this is all happening in Zoom’s data center? So I log in to my Zoom account, I’ve got my engineer digital twin and my designer digital twin and whoever else, and I’m saying, “Alright, go off, go do stuff,” in a Zoom interface, or do I own these and I’m connecting to Zoom?
I think the interface is Zoom’s interface. However, how to manage that is very different. That’s the reason why I like crypto technology. It’s more like fully distributed. I do not think you can store the digital twin of yourself to our server. You will store somewhere you feel very safe, likely maybe on the edge, on your phone, desktop, or maybe somewhere you trust more, like where you store your Bitcoin. Something like that. I do not think you give your digital twin to each of the vendors. You use Zoom, use other services. I do not think that’s our architecture.
I’m just so curious because you have to build a lot of this to enable this. This is the vision. And some of these questions just seem fundamental. What is the rate at which you can deploy digital twins? That seems like a big decision we all need to make together. Have you thought about that? That someone might want to send a thousand digital twins out into the world? That might be a weird outcome.
You are so right. That’s the reason why AI is full of uncertainties, but in reality, it will happen. Whenever I train using my consumer LLM and have multiple digital twins, my friend also trusts that. That works for sure. There’s some side effects, such as how to leverage all the distributing computing technology, AI technology, AR technology, crypto technology. That’s the reason why in the next 10 or 20 years, it’s more exciting than the past 20 years.
[snip]
Well, this is great, Eric. Thank you so much for being on Decoder. You’re going to have to come back soon, maybe as a digital twin someday.
Or maybe in person.
Even better.
Yes. Thank you, my friend. I really appreciate it.
Leave a Reply