Transcript Reader Lenny's Podcast
Library
Builder transcript 中文已完成

The Godmother of AI on jobs, robots & why world models are next | Dr. Fei-Fei

Read the source conversation in a calm, mobile-friendly layout.

ChannelLenny's Podcast
Language中文
SourceYouTube
Coverage100%
0% 章节 01
Video Source The Godmother of AI on jobs, robots & why world models are next | Dr. Fei-Fei

Lenny's Podcast

https://www.youtube.com/watch?v=Ctjiatnd6Xk
Reading Mode

默认显示中文,缺失的章节会自动回退到英文原文,保证这页随时可读。

章节 01 / 07

第01节

中文 中文暂未完整,先显示英文原文

Lenny RachitskyA lot of people call you the godmother of AI. The work you did actually was the spark that brought us out of AI winter.

Dr. Fei Fei LiIn the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. 2017-ish was the beginning of companies calling themselves AI companies.

Lenny RachitskyThere's this line, I think, this was when you were presenting to Congress. There's nothing artificial about AI. It's inspired by people. It's created by people, and most importantly, it impacts people.

Dr. Fei Fei LiIt's not like I think AI will have no impact on jobs or people. In fact, I believe that whatever AI does, currently or in the future, is up to us. It's up to the people. I do believe technology is a net positive for humanity, but I think every technology is a double-edged sword. If we're not doing the right thing as a society, as individuals, we can screw this up as well.

Lenny RachitskyYou had this breakthrough insight of just, okay, we can train machines to think like humans, but it's just missing the data that humans have to learn as a child.

Dr. Fei Fei LiI chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We need to train machines with as much information as possible on images of objects, but objects are very, very difficult to learn. A single object can have infinite possibilities that is shown on an image. In order to train computers with tens and thousands of object concepts, you really need to show it millions of examples.

Lenny RachitskyToday, my guest is Dr. Fei-Fei Li, who's known as the godmother of AI. Fei-Fei has been responsible for and at the center of many of the biggest breakthroughs that sparked the AI revolution that we're currently living through. She spearheaded the creation of ImageNet, which was basically her realizing that AI needed a ton of clean-labeled data to get smarter, and that data set became the breakthrough that led to the current approach to building and scaling AI models. She was chief AI scientist at Google Cloud, which is where some of the biggest early technology breakthroughs emerged from. She was director at SAIL, Stanford's Artificial Intelligence Lab, where many of the biggest AI minds came out of. She's also co-creator of Stanford's Human-Centered AI Institute, which is playing a vital role in a direction that AI is taking. She's also been on the board of Twitter. She was named one of Time's 100 Most Influential People in AI. She's also United Nations advisory board. I could go on.

In our conversation, Fei-Fei shares a brief history of how we got to today in the world of AI, including this mind-blowing reminder that 9 to 10 years ago, calling yourself an AI company was basically a death knell for your brand because no one believed that AI was actually going to work. Today, it's completely different. Every company is an AI company. We also chat about her take on how she sees AI impacting humanity in the future, how far current technologies will take us, why she's so passionate about building a world model and what exactly world models are, and most exciting of all, the launch of the world's first large world model, Marble, which just came out as this podcast comes out. Anyone can go play with this at marble.worldlabs.ai. It's insane. Definitely check it out. Fei-Fei is incredible and way too under the radar for the impact that she's had on the world, so I am really excited to have her on and to spread her wisdom with more people.
Figma Make is a different kind of vibe coding tool. Because it's all in Figma, you can use your team's existing design building blocks, making it easy to create outputs that look good and feel real and are connected to how your team builds. Stop spending so much time telling people about your product vision and instead show it to them. Make code-back prototypes and apps fast with Figma Make. Check it out at figma.com/lenny.
Fei-Fei, thank you so much for being here and welcome to the podcast.

Dr. Fei Fei LiI'm excited to be here, Lenny.

Lenny RachitskyI'm even more excited to have you here. It is such a treat to get to chat with you. There's so much that I want to talk about. You've been at the center of this AI explosion that we're seeing right now for so long. We're going to talk about a bunch of the history that I think a lot of people don't even know about how this whole thing started, but let me first read a quote from Wired about you just so people get a sense, and in the intro I'll share all of the other epic things you've done. But I think this is a good way to just set context. "Fei-Fei is one of a tiny group of scientists, a group perhaps small enough to fit around a kitchen table, who are responsible for AI's recent remarkable advances."

A lot of people call you the godmother of AI, and unlike a lot of AI leaders, you're an AI optimist. You don't think AI is going to replace us. You don't think it's going to take all our jobs. You don't think it's going to kill us. So I thought it'd be fun to start there, just what's your perspective on how AI is going to impact humanity over time?

Dr. Fei Fei LiYeah, okay, so Lenny, let me be very clear. I'm not a utopian, so it's not like I think AI will have no impact on jobs or people. In fact, I'm a humanist. I believe that whatever AI does, currently or in the future, is up to us. It's up to the people. So I do believe technology is a net positive for humanity. If you look at the long course of civilization, I think we are, and fundamentally, we're an innovative species that we... If you look at from written record thousands of years ago to now, humans just kept innovating ourselves and innovating our tools, and with that, we make lives better, we make work better, we build civilization, and I do believe AI is part of that. So that's where the optimism comes from. But I think every technology is a double-edged sword, and if we're not doing the right thing as a species, as a society, as communities, as individuals, we can screw this up as well.

Lenny RachitskyThere's this line, I think, this was when you were presenting to Congress, "There's nothing artificial about AI. It's inspired by people. It's created by people, and most importantly, it impacts people." I don't have a question there, but what a great line.

Dr. Fei Fei LiYeah, I feel pretty deeply. I started working AI two and a half decades ago, and I've been having students for the past two decades and almost every student who graduates, I remind them when they graduate from my lab that your field is called artificial intelligence, but there's nothing artificial about it.

Lenny RachitskyComing back to the point you just made about how it's kind of up to us about where this all goes, what is it you think we need to get right? How do we set things on a path? I know this is a very difficult question to answer, but just what's your advice? What do you think we should be keeping in mind?

Dr. Fei Fei LiYeah, how many hours do we have?

Lenny RachitskyHow do we align AI? There we go. Let's solve it.

Dr. Fei Fei LiSo I think people should be responsible individuals no matter what we do. This is what we teach our children, and this is what we need to do as grownups as well. No matter which part of the AI development or AI deployment or AI application you are participating in, and most likely many of us, especially as technologists, we're in multiple points. We should act like responsible individuals and care about this. Actually, care a lot about this. I think everybody today should care about AI because it is going to impact your individual life. It is going to impact your community, it's going to impact the society and the future generation. And caring about it as a responsible person is the first, but also the most important step.

Lenny RachitskyOkay, so let me actually take a step back and kind of go to the beginning of AI. Most people started hearing and caring about AI, as what it's called today, just like, I don't know, a few years ago when ChatGPT came out. Maybe it was like three years ago.

Dr. Fei Fei LiThree years ago, almost one more month, three years ago.

Lenny RachitskyWow, okay. And that was ChatGPT coming out. Is that the milestone you have in mind?

Dr. Fei Fei LiYes.

Lenny RachitskyOkay, cool. That's exactly how I saw it. But very few people know there was a long, long history of people working on, it was called machine learning back then and there's other terms, and now it's just everything's AI and there was kind of a long period of just a lot of people working on it. And then there's this what people refer to as the AI winter where people just gave up almost, most people did, and just, okay, this idea isn't going anywhere. And then the work you did actually was essentially the spark that brought us out of AI winter and is directly responsible for the world where now of just AI is all we talk about. As you just said, it's going to impact everything we do. So I thought it'd be really interesting to hear from you just the brief history of what the world was like before ImageNet and just the work you did to create ImageNet, why that was so important, and then just what happened after.

Dr. Fei Fei LiIt is, for me, hard to keep in mind that AI is so new for everybody when I lived my entire professional life in AI. There's a part of me that is just, it's so satisfying to see a personal curiosity that I started barely out of teenagehood and now has become a transformative force of our civilization. It generally is a civilizational level technology. So that journey is about 30 years or 20 something, 20 plus years, and it's just very satisfying. So where did it all start? Well, I'm not even the first generation AI researcher. The first generation really date back to the '50s and '60s, and Alan Turing was ahead of his time in the '40s by asking, daring humanity with the question, "Is there thinking machines?" And of course he has a specific way of testing this concept of thinking machine, which is a conversational chatbot, which to his standard we now have a thinking machine.

But that was just a more anecdotal inspiration. The field really began in the '50s when computer scientists came together and look at how we can use computer programs and algorithms to build these programs that can do things that have been only capable by human cognition. And that was the beginning. And the founding fathers the Dartmouth workshop in the 1956, we have Professor John McCarthy who later came to Stanford who coined the term artificial intelligence. And between the '50s, '60s, '70s, and '80s, it was the early days of AI exploration and we had logic systems, we had expert systems, we also had early exploration of neural network. And then it came to around the late '80s, the '90s, and the very beginning of the 21st century. That stretch about 20 years is actually the beginning of machine learning, is the marriage between computer programming and statistical learning.
And that marriage brought a very, very critical concept into AI, which is that purely rule-based program is not going to account for the vast amount of cognitive capabilities that we imagine computers can do. So we have to use machines to learn the patterns. Once the machines can learn the patterns, it has a hope to do more things. For example, if you give it three cats, the hope is not just for the machines to recognize these three cats. The hope is the machines can recognize the fourth cat, the fifth cat, the sixth cat, and all the other cats. And that's a learning ability that is fundamental to humans and remaining animals. And we, as a field, realized, "We need machine learning." So that was up till the beginning of the 21st century. I entered the field of AI literally in the year of 2000. That's when my PhD began at Caltech.
And so I was one of the first generation machine learning researchers and we were already studying this concept of machine learning, especially neural network. I remember that was one of my first courses at Caltech is called neural network, but it was very painful. It was still smack in the middle of the so-called AI winter, meaning the public didn't look at this too much. There wasn't that much funding, but there was also a lot of ideas flowing around. And I think two things happened to myself that brought my own career so close to the birth of modern AI is that I chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We can talk a little more later, but so much of our intelligence is built upon visual, perceptual, spatial understanding, not just language per se. I think they're complementary.
So I choose to look at visual intelligence and my PhD and my early professor years, my students and I are very committed to a north star problem, which is solving the problem of object recognition because it's a building block for the perceptual world, right? We go around the world interpreting reasoning and interacting with it more or less at the object level. We don't interact with the world at the molecular level. We don't interact with the world as... We sometimes do, but we rarely, for example, if you want to lift a teapot, you don't say, "Okay, the teapot is made of a hundred pieces of porcelain and let me work on this a hundred pieces." You look at this as one object and interact with it. So object is really important. So I was among the first researchers to identify this as a north star problem, but I think what happened is that as a student of AI and a researcher of AI, I was working on all kinds of mathematical models including neural network, including Bayesian network, including many, many models.
And there was one singular pain point is that these models don't have data to be trained on. And as a field, we were so focusing on these models, but it dawned on me that human learning as well as evolution is actually a big data learning process. Humans learn with so much experience constantly. In the evolution, if you look at time, animals evolve with just experiencing the world. So I think my students and I conjectured that a very critically-overlooked ingredient of bringing AI to life is big data. And then we began this ImageNet project in 2006, 2007. We were very ambitious. We want to get the entire internet's image data on objects. Now granted internet was a lot smaller than today, so I felt like that ambition was at least not too crazy. Now, it's totally delusional to think a couple of graduate student and a professor can do this.
And that's what we did. We curated very carefully, 15 million images on the internet, created a taxonomy of 22,000 concepts, borrowing other researchers' work like linguists work on WordNet, and it's a particular way of dictionarying words. And we combine that into ImageNet and we open-sourced that to the research community. We held an annual ImageNet challenge to encourage everybody to participate in this. We continue to do our own research, but 2012 was the moment that many people think was the beginning of the deep learning or birth of modern AI because a group of Toronto researchers led by Professor Geoff Hinton, participated in ImageNet Challenge, used ImageNet big data and two GPUs from NVIDIA and created successfully the first neural network algorithm that can...
It didn't totally solve, but made a huge progress towards solving the problem of object recognition. And that combination of the trio technology, big data, neural network, and GPU was kind of the golden recipe for modern AI. And then fast-forward, the public moment of AI, which is the ChatGPT moment, if you look at the ingredients of what brought ChatGPT to the world technically still use these three ingredients. Now, it's internet-scale data mostly texts is a much more complex neural network architecture than 2012, but it's still neural network and a lot more GPUs, but it's still GPUs. So these three ingredients are still at the core of modern AI.

Lenny RachitskyIncredible. I have never heard that full story before. I love that it was two GPUs was the first. I love that. And now it's, I don't know, hundreds of thousands, right, that are orders of magnitude more powerful.

Dr. Fei Fei LiYep.

Lenny RachitskyAnd those two GPUs where they just bought, they were like gaming GPUs, they just went to the-

Dr. Fei Fei LiYes.

Lenny Rachitsky... GameStar that people use for playing games. As you said, this continues to be in a large way, the way models get smarter. Some of the fastest growing companies in the world right now, I've had them all mostly on the podcast, Mercor and Surge and Scale. They continue to do this for labs, just give them more and more label data of the things they're most excited and interested in.

Dr. Fei Fei LiYeah, I remember Alex Wang from Scale very early days. I probably still has his emails when he was starting Scale. He was very kind. He keeps sending me emails about how image that inspired Scale. I was very pleased to see that.

Lenny RachitskyOne of my other favorite takeaways from what you just shared is just such an example of high agency and just doing things that's kind of a meme on Twitter. Just you can just do things. You're just like, okay, this is probably necessary to move AI. And it's called machine learning back then, right? Was that the term most people used?

Dr. Fei Fei LiI think it was interchangeably. It's true. I do remember the companies, the tech companies, I am not going to name names, but I was in a conversation in one of the early days, I think is in the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. And I remember I was actually encouraging everybody to use the word AI because to me that is one of the most audacious question humanity has ever asked in our quest for science and technology, and I feel very proud of this term. But yes, at the beginning some people were not sure.

Lenny RachitskyWhat year was that roughly when AI was a dirty word?

Dr. Fei Fei Li2016, I think because that was-

Lenny Rachitsky2016, less than 10 years ago.

Dr. Fei Fei LiThat was the changing. Some people start calling it AI, but I think if you look at the Silicon Valley tech companies, if you trace their marketing term, I think 2017-ish was the beginning of companies calling themselves AI companies.

Lenny RachitskyThat's incredible. Just how the world has changed.

Dr. Fei Fei LiYes.

Lenny RachitskyNow, you can't not call yourself an AI company.

Dr. Fei Fei LiI know.

Lenny RachitskyJust nine-ish years later.

Dr. Fei Fei LiYeah.

Lenny RachitskyOh, man. Okay. Is there anything else around the history, that early history that you think people don't know that you think is important before we chat about where you think things are going and the work that you're doing?

Dr. Fei Fei LiI think as all histories, I'm keenly aware that I am recognized for being part of the history, but there are so many heroes and so many researchers. We're talking about generations of researchers. In my own world, there are so many people who have inspired me, which I talked about in my book, but I do feel our culture, especially Silicon Valley, tends to assign achievements to a single person. While I think it has value, but it's just to be remembered. AI is a field of, at this point, 70 years old and we have gone through many generations. Nobody, no one could have gotten here by themselves.

Lenny RachitskyOkay, so let me ask you this question. It feels like we're always on this precipice of AGI, this kind of vague term people throw around, AGI is coming, it's going to take over everything. What's your take on how far you think we might be from AGI? Do you think we're going to get there on the current trajectory we're on? Do you think we need more breakthroughs? Do you think the current approach will get us there?

Dr. Fei Fei LiYeah, this is a very interesting term, Lenny. I don't know if anyone has ever defined AGI. There are many different definitions, including some kind of superpower for machines all the way to machines can become economically viable agent in the society. In other words, making salaries to live. Is that the definition of AGI? As a scientist, I take science very seriously and I enter the field because I was inspired by this audacious question of, can machines think and do things in the way that humans can do? For me, that's always the north star of AI. And from that point of view, I don't know what's the difference between AI and AGI.

I think we've done very well in achieving parts of the goal, including conversational AI, but I don't think we have completely conquered all the goals of AI. And I think our founding fathers, Alan Turing, I wonder if Alan Turing is around today and you ask him to contrast AI versus AGI, he might just shrugged and said, "Well, I asked the same question back in 1940s," so I don't want to get onto a rabbit hole of defining AI versus AGI. I feel AGI is more a marketing term than a scientific term as a scientist than technologist. AI is my north star, is my field's north star, and I'm happy people call it whatever name they want to call it.

Lenny RachitskySo let me ask you maybe this way, like you described, there's kind of these components that from ImageNet and AlexNet took us to where we're today, GPUs essentially, data, label data, just like the algorithm of the model. There's also just the transformer feels like an important step in that trajectory. Do you feel like those are the same components that'll get us to, I don't know, 10 times smarter model, something that's like life-changing for the entire world? Or do you think we need more breakthroughs? I know we're going to talk about world models, which I think is a component of this, but is there anything else that you think is like, oh, this will plateau, or okay, this will take us just need more data, more compute, more GPUs?

Dr. Fei Fei LiOh no, I definitely think we need more innovations. I think scaling loss of more data, more GPUs, and bigger current model architecture is there's still a lot to be done there, but I absolutely think we need to innovate more. There's not a single deeply scientific discipline in human history that has arrived at a place that says we're done, we're done innovating and AI is one of the, if not the youngest discipline in human civilization in terms of science and technology, we're still scratching the surface. For example, like I said, we're going to segue into world models. Today, you take a model and run it through a video of a couple of office rooms and ask the model to count the number of chairs. And this is something a toddler could do or maybe an elementary school kid could do, and AI could not do that, right?

So there's just so much AI today could not do, then let alone thinking about how did someone like Isaac Newton look at the movements of the celestial bodies and derive an equation or a set of equations that governs the movement of all bodies, that level of creativity, extrapolation, abstraction. We have no way of enabling AI to do that today. And then let's look at emotional intelligence. If you look at a student coming to a teacher's office and have a conversation about motivation, passion, what to learn, what's the problem that's really bothering you. That conversation, as powerful as today's conversational bots are, you don't get that level of emotional cognitive intelligence from today's AI. So there's a lot we can do better, and I do not believe we're done innovating.

Lenny RachitskyDemis had this really interesting interview recently from DeepMind slash Google where someone asked him just like, "What do you think, how far are we from AGI? What does it look like going through there?" He had a really interesting way of approaching it is if we were to give the most cutting-edge model all the information until the end of the 20th century, see if it could come up with all the breakthroughs Einstein had and so far we're nowhere near that, but they could just-

Dr. Fei Fei LiNo, we're not. In fact, it's even worse. Let's give AI all the data including modern instruments data of celestial bodies, which Newton did not have, and give it to that and just ask AI to create the 17th century set of equations on the laws of bodily movements. Today's AI cannot do that.

Lenny RachitskyAll right. We're ways away is what I'm hearing.

Dr. Fei Fei LiYeah.

Lenny RachitskyOkay, so let's talk about world models. To me, this is just another really amazing example of you being ahead of where people end up. So you were way ahead on, okay, we just need a lot of clean data for AI and neural networks to learn. You've been talking about this idea of world models for a long time. You started a company to build, essentially there's language models. This is a different thing. This is a world model. We'll talk about what that is. And now, as I was preparing for this Elon's talking about world models, Jensen's talking about world models, I know Google's working on this stuff. You've been at this for a long time and you actually just launched something that's going, we're going to talk about right before this podcast airs. Talk about what is a world model? Why is it so important?

Dr. Fei Fei LiI'm very excited to see that more and more people are talking about world models like Elon, like Jensen. I have been thinking about really how to push AI forward all my life and the large language models that came out of the research world and then OpenAI and all this, for the past few years, were extremely inspiring even for a researcher like me. I remembered when GPT2 came out, and that was in, I think, late 2020. I was co-director, I still am, but I was at that time full-time co-director of Stanford's Human-Centered AI institute, and I remember it was... The public was not aware of the power of the large language model yet, but as researchers, we were seeing it, we're seeing the future, and I had pretty long conversations with my natural language processing colleagues like Percy Liang and Chris Manning. We were talking about how critical this technology is going to be and the Stanford AI Institute, Human-Centered AI Institute, HAI, was the first one to establish a full research center foundation model.

We were, Percy Liang, and many researchers led the first academic paper foundation model. So it was just very inspiring for me. Of course, I come from the world of visual intelligence and I was just thinking there's so much we can push forward beyond language because humans, humans use our sense of spatial intelligence, a world understanding to do so many things and they are beyond language. Think about a very chaotic first responder scene, whether it's fire or some traffic accident or some natural disaster. And if you immerse yourself in those scene and think about how people organize themselves to rescue people, to stop further disasters, to put down fires, a lot of that is movements is spontaneous understanding of objects, worlds, human situational awareness. Language is part of that, but a lot of those situations, language cannot get you to put down the fire.
So that is, what is that? I was thinking a lot. And in the meantime, I was doing a lot of robotics research and it dawned on me that the linchpin of connecting the additional intelligence, in addition to language embodied AI, which are robotics, connecting visual intelligence, is the sense of spatial intelligence about understanding the world. And that's when I think it was 2024, I gave a TED talk about spatial intelligence at world models. And I start formulating this idea back in 2022 based on my robotics and computer vision research. And then one thing that was really clear to me is that I really want to work with the brightest technologists and move as fast as possible to bring this technology to life. And that's when we founded this company called World Labs. And you can see the word world is in the title of our company because we believe so much in world modeling and spatial intelligence.

Lenny RachitskyPeople are so used to just chatbots and that's a large language model. A simple way to understand a world model is you basically describe a scene and it generates an infinitely explorable world. We'll link to the thing you launched, which we'll talk about, but just is that a simple way to understand it?

Dr. Fei Fei LiThat's part of it, Lenny. I think a simple way to understand a world model is that this model can allow anyone to create any worlds in their mind's eye by prompting whether it's an image or a sentence. And also be able to interact in this world whether you are browsing and walking or picking objects up or changing things as well as to reason within this world, for example, if the person consuming, if the agent consuming this output of the world model is a robot, it should be able to plan its path and help to tidy the kitchen, for example. So world model is a foundation that you can use to reason, to interact, and to create worlds.

Lenny RachitskyGreat. Yeah. So robots feels like that's potentially the next big focus for AI researchers and just the impact on the world. And what you're saying here is this is a key missing piece of making robots actually work in the real world, understanding how the world works.

Dr. Fei Fei LiYeah. Well, first of all, I do think there's more than robots. That's exciting. But I agree with everything you just said. I think world modeling and spatial intelligence is a key missing piece of embodied AI. I also think let's not underestimate that humans are embodied agents and humans can be augmented by AI's intelligence. Just like today, humans are language animals, but we're very much augmented by AI helping us to do language tasks including software engineering. I think that we shouldn't underestimate or maybe we tend not to talk about how humans, as an embodied agents, can actually benefit so much from world models and spatial intelligence models as well as robots can.

Lenny RachitskySo the big unlocks here, robots, which a huge deal if this works out, imagine each of us has robots doing a bunch of stuff for us, they help us with disasters, things like that. Games obviously is a really cool example, just like infinitely playable games that you just invent out of your head. And then creativity feels like just like being fun, having fun, being creative, thinking of magic, wild new worlds, and environments.

Dr. Fei Fei LiAnd also design, humans design from machines to buildings to homes and also scientific discovery. There is so much. I like to use the example of the discovery of the structure of DNA. If you look at one of the most important piece in DNA's discovery history is the x-ray diffraction photo that was captured by Rosalind Franklin, and it was a flat 2D photo of a structure that it looks like a cross with diffractions. You can google those photos. But with that 2D flat photo, the humans, especially two important humans, James Watson and Francis Crick, in addition to their other information, was able to reason in 3D space and deduce a highly three-dimensional double helix structure of the DNA. And that structure cannot possibly be 2D. You cannot think in 2D and deduce that structure. You have to think in 3D spatial, use the human spatial intelligence. So I think even in scientific discovery, spatial intelligence or AI-assisted spatial intelligence is critical.

Lenny RachitskyThis is such an example of, I think it was Chris Dixon that had this line that the next big thing is going to start off feeling like a toy. When ChatGPT just came out, I remember Sam Altman just tweeted it as like, "Here's a cool thing we're playing with, check it out." Now, it's the fastest growing product to all of history, changed the world. And it's oftentimes the things that just look like, okay, this is cool, that it's a fun to play with that end up changing the world most.

It's a more secure and branded experience. Plus you get features like interactive carousels and suggested replies. And here's why this matters, US carriers are starting to adopt RCS. Sinch is already helping major brands send RCS messages around the world and they're helping Lenny's podcast listeners get registered first before the rush hits the US market. Learn more and get started at sinch.com/lenny. That's S-I-N-C-H.com/lenny.
I reached out to Ben Horowitz, who loves what you're doing, a big fan of yours. They're investors I believe in...

Dr. Fei Fei LiYeah, we've known each other for many years, but yes, right now they're investors of World Labs.

Lenny RachitskyAmazing. Okay, so I asked him what I should ask you about and he suggested ask you why is the bitter lesson alone not likely to work for robots? So first of all, just explain what the bitter lesson was in the history of AI and then just why that won't get us to where we want to be with robots.

Dr. Fei Fei LiWell, first of all, there are many bitter lessons, but the bitter lessons everybody refers to is a paper written by Richard Sutton who won the Turing Award recently, and he does a lot of reinforcement learning. And Richard has said, if you look at the history, especially the algorithmic development of AI, it turns out simpler model with a ton of data always win at the end of the day instead of the more complex model with less data. I mean, that was actually... This paper came years after ImageNet. That to me was not bitter; it was a sweet lesson. That's why I built ImageNet because I believe that big data plays that role. So why can't bitter lesson work in robotics alone? Well, first of all, I think we need to give credit to where we are today. Robotics is very much in the early days of experimentation.

The research is not nearly as mature as say language models. So many people are still experimenting with different algorithms and some of those algorithms are driven by big data. So I do think big data will continue to play a role in robotics, but what is hard for robotics, there are a couple of things. One is that it's harder to get data. It's a lot harder to get data. You can say, well, there's web data. This is where the latest robotics research is using web videos. And I think web videos do play a role. But if you think about what made language model worth a very... As someone who does computer vision and spatial intelligence and robotics, I'm very jealous of my colleagues in language because they had this perfect setup where their training data are in words, eventually tokens, and then they produce a model that outputs words.
So you have this perfect alignment between what you hope to get, which we call objective function and what your training data looks like. But robotics is different. Even spatial intelligence is different. You hope to get actions out of robots, but your training data lacks actions in 3D worlds, and that's what robots have to do, right? Actions in 3D worlds. So you have to find different ways to fit a, what do they call, a square in a round hole, that what we have is tons of web videos. So then we have to start talking about adding supplementing data such as teleoperation data or synthetic data so that the robots are trained with this hypothesis of bitter lesson, which is large amount of data. I think there's still hope because even what we are doing in world modeling will really unlock a lot of this information for robots.
But I think we have to be careful because we're at the early days of this and bitter lesson is still to be tested because we haven't fully figured out the data for. Another part of the bitter lesson of robotics I think we should be so realistic about is again, compared to language models or even spatial models, robots are physical systems. So robots are closer to self-driving cars than a large language model. And that's very important to recognize. That means that in order for robots to work, we not only need brains, we also need the physical body. We also need application scenarios. If you look at the history of self-driving car, my colleague Sebastian Thrun took Stanford's car to win the first DARPA challenge in 2006 or 2005. It's 20 years since that prototype of a self-driving car being able to drive 130 miles in the Nevada desert to today's Waymo and on the street of San Francisco.
And we're not even done yet. There's still a lot. So that's a 20-year journey. And self-driving cars are much simpler robots, they're just metal boxes running on 2D surfaces, and the goal is not to touch anything. Robot is 3D things running in 3D world, and the goal is to touch things. So the journey is going to be, there's many aspects, elements, and of course one could say, well, the self-driving car, early algorithm were pre deep learning era. So deep learning is accelerating the brains. And I think that's true. That's why I'm in robotics, that's why I'm in spatial intelligence and I'm excited by it. But in the meantime, the car industry is very mature and productizing also involves the mature use cases, supply chains, the hardware. So I think it's a very interesting time to work in these problems. But it's true, Ben is right. We might still be subject to a number of bitter lessons.

Lenny RachitskyDoing this work, do you ever just feel awe for the way the brain works and is able to do all of this for us? Just the complexity just to get a machine to just walk around and not hit things and fall, does just give you more respect for what we've already got?

Dr. Fei Fei LiTotally. We operate on about 20 watts. That's dimmer than any light bulb in the room I'm in right now. And yet we can do so much. So I think actually the more I work in AI, the more I respect humans.

Lenny RachitskyLet's talk about this product you just launched. It's called Marble, a very cute name. Talk about what this is, why this is important. I've been playing with it, it's incredible. We'll link to it for folks to check it out. What is Marble?

Dr. Fei Fei LiYeah, I'm very excited. So first of all, Marble is one of the first product that World Labs has rolled out. World Labs is a foundation frontier model company. We are founded by four co-founders who have deep technical history. My co-founders, Justin Johnson, Christoph Lassner, and Ben Mildenhall. We all come from the research field of AI, computer graphics, computer vision, and we believe that spatial intelligence and world modeling is as important, if not more, to language models and complementary to language models. So we wanted to seize this opportunity to create deep tech research lab that can connect the dots between frontier models with products. So Marble is an app that's built upon our frontier models. We've spent a year and plus building the world's first generative model that can output genuinely 3D worlds. That's a very, very hard problem.

And it was a very hard process and we have a team of incredible, founding team of incredible technologists from incredible teams. And then around just a month or two ago, we saw the first time that we can just prompt with a sentence and the image and multiple images and create worlds that we can just navigate in. If you put it on Google, which we have an option to let you do that, you can even walk around. Even though we've been building this for quite a while, it was still just awe-inspiring and we wanted to get into the hands of people who need it. And then we know that so many creators, designers, people who are thinking about robotic simulation, people who are thinking about different use cases of navigable interactable, immersive worlds game developers will find this useful. So we developed Marble as a first step. It's again, still very early, but it's the world's first model doing this, and it's the world's first product that allows people to just prompt, we call it prompt to worlds.

Lenny RachitskyWell, I've been playing around with it. It is insane. You could just have a little Shire world where you just infinitely walk around middle earth basically, and there's no one there yet, but it's insane. You just go anywhere. There's dystopian world. I'm just looking at all these examples and my favorite part, actually, I don't know if there's a feature or bug, you can see the dots of the world before it actually renders with all the textures. And I just love like, you get a glimpse into what is going on with this model, basically-

Dr. Fei Fei LiThat is so cool to hear because this is where, as a researcher, I am learning because the dots that lead you into the world was an intentional feature visualization, is not part of the model. The model actually just generates the world. But we were trying to find a way to guide people into the world, and a number of engineers worked on different versions, but we converged on the dot, and so many people, you're not the only one, told us how delightful that experience is, and it was really satisfying for us to hear that this intentional visualization feature that's not just the big hardcore model actually has delighted our users.

Lenny RachitskyWow. So you add that to make it more, like to have humans understand what's going on-

Dr. Fei Fei LiTo have fun, yes.

Lenny Rachitsky... get more delightful. Wow, that is hilarious. It makes me think about LLMs and the way they, it's not the same thing, but they talk about what they're thinking and what they're doing.

Dr. Fei Fei LiYes, it is. It is.

Lenny RachitskyIt also makes me think about just the Matrix. It's exactly the Matrix experience. I don't know if that was your inspiration.

Dr. Fei Fei LiWell, like I said, a number of engineers worked on that. It could be their inspiration.

Lenny RachitskyIt's in their subconscious. Okay, so just for folks that may want to play around with this, maybe like, what are some applications today that folks can start using today? What's your goal with this launch?

Dr. Fei Fei LiYeah, so we do believe that world modeling is very horizontal, but we're already seeing some really exciting use cases, virtual production for movies, because what they need are 3D worlds that they can align with the camera. So when the actors are acting on it, they can position the camera and shoot the segments really well. And we're already seeing incredible use. In fact, I don't know if you have seen our launch video showing Marble. It was produced by a virtual production company. We collaborated with Sony and they use Marble scenes to shoot those videos. So we were collaborating with those technical artists and directors, and they were saying, this has cut our production time by 40X. In fact, it has to-

Lenny Rachitsky40X?

Dr. Fei Fei LiYes, in fact it has to, because we only had one month to work on this project and there were so many things they were trying to shoot. So using Marble really, really significantly accelerated the virtual production for VFX and movies. That's one use cases. We are already seeing our users taking our Marble scene and taking the mesh export and putting games, whether it's games on VR or just fun games that they have developed. We are showing an example of robotic simulation because when I was, I mean I still am a researcher doing robotic training. One of the biggest pain point is to create synthetic data for training robots. And this synthetic data needs to be very diverse. They need to come from different environments with different objects to manipulate. And one path to it is to ask computers to simulate.

Otherwise, humans have to build every single asset for robots. That's just going to take a lot longer. So we already have researchers reaching out and wanting to use Marble to create those synthetic environments. We also have unexpected user outreach in terms of how they want to use Marble. For example, a psychologist team called us to use Marble to do psychology research. It turned out some of the psychiatric patients they study, they need to understand how their brain respond to different immersive things of different features. For example, messy scenes or clean scenes or whatever you name it. And it's very hard for researchers to get their hands on these kind of immersive scenes and it will take them too long and too much budget to create. And Marble is a really almost instantaneous way of getting so many of these experimental environments into their hands. So we're seeing multiple use cases at this point. But the VFX, the game developers, the simulation developers as well as designers are very excited.

Lenny RachitskyThis is very much the way things work in AI. I've had other AI leaders on the podcast and it's always put things out there early as soon as you can to discover where the big use cases are. The head of ChatGPT told me how, when they first put out ChatGPT, he was just scanning TikTok to see how people were using it and all the things they were talking about, and that's what convinced them where to lean in and help them see how people actually want to use it. I love this last use case for therapy. I'm just imagining heights, people dealing with heights or snakes or spiders, which-

Dr. Fei Fei LiIt's amazing. A friend of mine last night literally called me and talked about his height scare and asked me if Marble should be used. It's amazing you went straight there.

Lenny RachitskyBecause imagining all the exposure therapy stuff, this could be so good for that. That is so cool. Okay, so I should have asked you this before, but I think there's going to be a question of just, how does this differ from things like VO3 and other video generation models? It's pretty clear to me, but I think it might be helpful just to explain how this is different from all the video AI tools people have seen.

Dr. Fei Fei LiWorld Labs' thesis is that spatial intelligence is fundamentally very important, and spatial intelligence is not just about videos. In fact, the world is not passively watching videos passing by. I love, Plato has the allegory of the cave analogy to describe vision. He said that imagine a prisoner tied on his chair, not very humane, but in a cave watching a full life theater in front of him, but the actual life theater that actors are acting is behind his back. It was just lit so that the projection of the action is on a wall of the cave. And then the goal, the task of this prisoner is to figure out what's going on. It's a pretty extreme example, but it really shows, it describes what vision is about, is that to make sense of the 3D world or 4D world out of 2D. So spatial intelligence to me is deeper than only creating that flat 2D world.

Spatial intelligence to me is the ability to create, reason, interact, make sense of deeply spatial world, whether it's 2D or 3D or 4D, including dynamics and all that. So World Lab is focusing on that, and of course the ability to create videos per se could be part of this. And in fact, just a couple of weeks ago, we rolled out the world's first real time demoable, real-time video generation on a single H100 GPU. So part of our technology includes that, but I think Marble is very different because we really want creators, designers, developers to have in their hands a model that can give them worlds with 3D structures so they can use it for their work. And that's why Marble is so different.

Lenny RachitskyThe way I see it is it's a platform for a ton of opportunity to do stuff. As you described, videos are just like, here's a one-off video that's very fun and cool and you could... And that's it. That's it. And you move on.

Dr. Fei Fei LiBy the way, we could in Marble, we could allow people to export in video forms. So you could actually, like you said, you go into a world, so let's say it's a hobbit cave. You can actually, especially as a creator, you have such a specific way of moving the camera in a trajectory in the director's mind, and then you can export that from Marble into a video.

Lenny RachitskyWhat does it take to create something like this? Just how big is the team, how many GPUs you work in? Anything you can share there. I don't know how much of this is private information, but just what does it take to create something like this that you've launched here?

Dr. Fei Fei LiIt takes a lot of brain power. So we just talk about 20 watts per brain. So from that point of view, it's a small number, but it's actually incredible. It's half billion years of evolution to give us those power. We have a team of 30-ish people now, and we are predominantly researchers and research engineers, but we also have designers and product. We actually really believe that we want to create a company that's anchored in the deep tech of spatial intelligence, but we are actually building serious products. So we have this integration of R&D and productization, and of course, we use a ton of GPUs.

Lenny RachitskyThat's the technical thing.

Dr. Fei Fei LiHappy to hear.

Lenny RachitskyWell, congrats on the launch. I know this is a huge milestone. I know this took a ton of work.

Dr. Fei Fei LiThank you.

Lenny RachitskySo I just want to say congrats to you and your team. Let me talk about your founder journey for a moment. So you're a founder of this company. You started how many years ago? A couple of years ago, two, three years ago?

Dr. Fei Fei LiA year ago.

Lenny RachitskyA year ago?

Dr. Fei Fei LiA year plus.

Lenny RachitskyA year? Okay. Wow.

Dr. Fei Fei LiProbably, 18 month, yeah.

Lenny RachitskyOkay. What's something you wish you knew before you started this that you wish you could whisper into the ear of Fei-Fei of 18 months ago?

Dr. Fei Fei LiWell, I continue to wish I know the future of technology. I think actually that's one of our founding advantage is that we see the future earlier in general than most people. But still, man, this is so exciting and so amazing that what's unknown and what's coming, but I know the reason you're asking me this question is not about the future of technology. Furthermore, look, I did not start a company of this scale at 20-year-old. So I started a dry cleaner when I was 19, but that's a little smaller scale.

Lenny RachitskyWe got to talk about that.

Dr. Fei Fei LiAnd then I founded Google Cloud AI and then I founded an institute at Stanford but those are different beasts. I did feel I was a little more prepared as a founder of the grinding journey compared to maybe the 20-year-old founders. But I still, I'm surprised, and it puts me into paranoia sometimes that how intensely competitive AI landscape is from the model, the technology itself, as well as talents. And when I founded the company, we did not have these incredible stories of how much certain talents would cost. So these are things that continue to surprise me and I have to be very alert about.

Lenny RachitskySo the competition you're talking about is the competition for talent, the speed at which just how things are moving.

Dr. Fei Fei LiYeah.

Lenny RachitskyYeah. You mentioned this point that I want to come back to that if you just look over the course of your career, you were at all of the major collections of humans that led to so many of the breakthroughs that are happening today. Obviously, we talk about ImageNet also just SAIL at Stanford is where a lot of the work happened, Google Cloud, which a lot of the breakthroughs happened. What brought you to those places? Like for people looking for how to advance in their career, be at the center of the future, just is there a through line there of just what pulled you from place to place and pulled you into those groups that might be helpful for people to hear?

Dr. Fei Fei LiYeah, this is actually a great question, Lenny, because I do think about it, and obviously we talked about it's curiosity and passion that brought me to AI, that is more a scientific north star, right? I did not care if AI was a thing or not, so that was one part. But how did I end up choosing in the particular places I work in, including starting World Labs, is I think I'm very grateful to myself or maybe to my parents' genes. I'm an intellectually very fearless person, and I have to say when I hire young people, I look for that because I think that's a very important quality if one wants to make a difference, is that when you want to make a difference, you have to accept that you're creating something new or you're diving into something new. People haven't done that. And if you have that self-awareness, you almost have to allow yourself to be fearless and to be courageous.

So when I, for example, came to Stanford, in the world of academia, I was very close to this thing called tenure, which is have the job forever at Princeton. But I chose to come to Stanford because... I love Princeton. It's by alma mater. It's just at that moment there are people who are so amazing at Stanford and the Silicon Valley ecosystem was so amazing that I was okay to take a risk of restarting my tenure clock. Becoming the first female director of SAIL, I was actually relatively speaking a very young faculty at that time, and I wanted to do that because I care about that community. I didn't spend too much time thinking about all the failure cases.
Obviously, I was very lucky that the more senior faculty supported me, but I just wanted to make a difference. And then going to Google was similar. I wanted to work with people like Jeff Dean, Jeff Hinton, and all these incredible demists, the incredible people. The same with World Labs. I have this passion. And I also believe that people with the same mission can do incredible things. So that's how it guided my through line. I don't overthink of all possible things that can go wrong because that's too many.

Lenny RachitskyI feel like an important element of this is not focusing on the downside, focusing more on the people, the mission. What gets you excited, what do you think, the curiosity.

Dr. Fei Fei LiYeah. I do want to say one thing to all the young talents in AI, the engineers, the researchers out there, because some of you apply to World Labs, I feel very privileged you considered World Labs. I do find many of the young people today think about every single aspect of an equation when they decide on jobs. At some point, maybe that's the way they want to do it, but sometimes I do want to encourage young people to focus on what's important because I find myself constantly in mentoring mode when I talk to job candidates. Not necessarily recruiting or not recruiting, but just in mentoring mode when I see an incredible young talent who is over-focusing on every minute dimension and aspect of considering a job, when maybe the most important thing is where's your passion? Do you align with the mission? Do you believe and have faith in this team? And just focus on the impact and you can make and the kind of work and team you can work with.

Lenny RachitskyYeah, it's tough. It's tough for people in the AI space. Now there's so much, so much at them, so much new, so much happening, so much FOMO.

Dr. Fei Fei LiThat's true.

Lenny RachitskyI could see the stress. And so I think that advice is really important. Just like what will actually make you feel fulfilled in what you're doing, not just where's the fastest growing company, where's the... Who's going to win? I don't know. I want to make sure I ask you about the work you're doing today at Stanford, at the HCI. I think it's the-

Dr. Fei Fei LiHAI.

Lenny RachitskyHAI, Human-Centered AI Institute. What are you doing there? I know this is a thing you do on the side still.

Dr. Fei Fei LiSo yes, HAI, Human-Centered AI Institute was co-founded by me and a group of faculty like Professor John Etchemendy, Professor James Landay, Professor Chris Manning back in 2018. I was actually finishing my last sabbatical at Google and it was a very, very important decision for me because I could have stayed in industry, but my time at Google taught me one thing is AI is going to be a civilization of technology. And it dawned on me how important this is to humanity to the point that I actually wrote a piece in New York Times, that year 2018, to talk about the need for a guiding framework to develop and to apply AI. And that framework has to be anchored in human benevolence, in human centeredness. And I felt that Stanford, one of the world's top university in the heart of Silicon Valley that gave birth to important companies from NVIDIA to Google, should be a thought leader to create this human-centered AI framework and to actually embody that in our research education and policy and ecosystem work.

So I founded HAI. Fast-forward, after six, seven years, it has become the world's largest AI institute that does human-centered research, education, ecosystem, outreach, and policy impact. It involves hundreds of faculty across all eight schools at Stanford, from medicine to education, to sustainability to business, to engineering, to humanities to law. And we support researchers, especially at the interdisciplinary area from digital economy, to legal studies, to political science, to discovery of new drugs, to new algorithms to that's beyond transformers. We also actually put a very strong focus on policy because when we started HAI, I realized that Silicon Valley did not talk to Washington DC and or Brussels or other parts of the world.
And given how important this technology is, we need to bring everybody on board. So we created multiple programs from congressional bootcamp to AI index report to policy briefing, and we especially participated in policymaking including advocating for a national AI research cloud bill that was passed in the first Trump administration and participating in state level regulatory AI discussions. So there's a lot we did, and I continue to be one of the leaders even though I'm much less involved operationally because I care not only we create this technology, but we use it in the right way.

Lenny RachitskyWow. I was not aware of all that other work you were doing. As you're talking, I was reminded Charlie Munger had this quote, "Take a simple idea and take it very seriously." I feel like you've done that in so many different ways and stayed with it and it's unbelievable the impact that you've had in so many ways over the years. I'm going to skip the lightning round and I'm just looking to ask you one last question. Is there anything else that you wanted to share? Anything else you want to leave listeners with?

Dr. Fei Fei LiI am very excited by AI, Lenny. I want to answer one question that when I travel around the world, everybody asks me is that, if I'm a musician, if I'm a teacher, middle school teacher, if I'm a nurse, if I'm an accountant, if I'm a farmer, do I have a role in AI or is AI just going to take over my life or my work? And I think this is the most important question of AI and I find that in Silicon Valley, we tend not to speak heart-to-heart with people, with people like us and not like us in Silicon Valley, but all of us, we tend to just toss around words like infinite productivity or infinite leisure time or infinite power or whatever. But at the end of the day, AI is about people. And when people ask me that question, it's a resounding yes, everybody has a role in AI.

It depends on what you do and what you want. But no technology should take away human dignity and the human dignity and agency should be at the heart of the development, the deployment, as well as the governance of every technology. So if you are a young artist and your passion is storytelling, embrace AI as a tool. In fact, embrace Marble. I hope it becomes a tool for you because the way you tell your story is unique and the world still needs it. But how you tell your story, how do you use the most incredible tool to tell your story in the most unique way is important. And that voice needs to be heard. If you are a farmer near retirement, AI still matters because you are a citizen. You can participate in your community, you should have a voice in how AI is used, how AI is applied.
You work with people that you can encourage all of you to use AI to make life easier for you. If you are a nurse, I hope you know that at least in my career, I have worked so much in healthcare research because I feel our healthcare workers should be greatly augmented and helped by AI technology, whether it's smart cameras to feed more information or robotic assistance because our nurses are overworked, overfatigued, and as our society ages, we need more help for people to be taken care of. So AI can play that role. So I just want to say that it's so important that even a technologist like me are sincere about that everybody has a role in AI.

Lenny RachitskyWhat a beautiful way to end it. Such a tie back to where we started about how it's up to us and take individual responsibility for what AI will do in our lives. Final question, where can folks find Marble? Where can they go, maybe try to join World Labs if they want to? What's the website? Where do people go?

Dr. Fei Fei LiWell, World Labs website is www.worldlabs.ai and you can find our research progress there. We have technical blogs. You can find Marble, the product there. You can sign in there. You can find our job posts link there. We're in San Francisco. We love to work with the world's best talents.

Lenny RachitskyAmazing. Fei-Fei, thank you so much for being here.

Dr. Fei Fei LiThank you, Lenny.

Lenny RachitskyBye everyone.

Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.

English Original transcript

Lenny RachitskyA lot of people call you the godmother of AI. The work you did actually was the spark that brought us out of AI winter.

Dr. Fei Fei LiIn the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. 2017-ish was the beginning of companies calling themselves AI companies.

Lenny RachitskyThere's this line, I think, this was when you were presenting to Congress. There's nothing artificial about AI. It's inspired by people. It's created by people, and most importantly, it impacts people.

Dr. Fei Fei LiIt's not like I think AI will have no impact on jobs or people. In fact, I believe that whatever AI does, currently or in the future, is up to us. It's up to the people. I do believe technology is a net positive for humanity, but I think every technology is a double-edged sword. If we're not doing the right thing as a society, as individuals, we can screw this up as well.

Lenny RachitskyYou had this breakthrough insight of just, okay, we can train machines to think like humans, but it's just missing the data that humans have to learn as a child.

Dr. Fei Fei LiI chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We need to train machines with as much information as possible on images of objects, but objects are very, very difficult to learn. A single object can have infinite possibilities that is shown on an image. In order to train computers with tens and thousands of object concepts, you really need to show it millions of examples.

Lenny RachitskyToday, my guest is Dr. Fei-Fei Li, who's known as the godmother of AI. Fei-Fei has been responsible for and at the center of many of the biggest breakthroughs that sparked the AI revolution that we're currently living through. She spearheaded the creation of ImageNet, which was basically her realizing that AI needed a ton of clean-labeled data to get smarter, and that data set became the breakthrough that led to the current approach to building and scaling AI models. She was chief AI scientist at Google Cloud, which is where some of the biggest early technology breakthroughs emerged from. She was director at SAIL, Stanford's Artificial Intelligence Lab, where many of the biggest AI minds came out of. She's also co-creator of Stanford's Human-Centered AI Institute, which is playing a vital role in a direction that AI is taking. She's also been on the board of Twitter. She was named one of Time's 100 Most Influential People in AI. She's also United Nations advisory board. I could go on.

In our conversation, Fei-Fei shares a brief history of how we got to today in the world of AI, including this mind-blowing reminder that 9 to 10 years ago, calling yourself an AI company was basically a death knell for your brand because no one believed that AI was actually going to work. Today, it's completely different. Every company is an AI company. We also chat about her take on how she sees AI impacting humanity in the future, how far current technologies will take us, why she's so passionate about building a world model and what exactly world models are, and most exciting of all, the launch of the world's first large world model, Marble, which just came out as this podcast comes out. Anyone can go play with this at marble.worldlabs.ai. It's insane. Definitely check it out. Fei-Fei is incredible and way too under the radar for the impact that she's had on the world, so I am really excited to have her on and to spread her wisdom with more people.
Figma Make is a different kind of vibe coding tool. Because it's all in Figma, you can use your team's existing design building blocks, making it easy to create outputs that look good and feel real and are connected to how your team builds. Stop spending so much time telling people about your product vision and instead show it to them. Make code-back prototypes and apps fast with Figma Make. Check it out at figma.com/lenny.
Fei-Fei, thank you so much for being here and welcome to the podcast.

Dr. Fei Fei LiI'm excited to be here, Lenny.

Lenny RachitskyI'm even more excited to have you here. It is such a treat to get to chat with you. There's so much that I want to talk about. You've been at the center of this AI explosion that we're seeing right now for so long. We're going to talk about a bunch of the history that I think a lot of people don't even know about how this whole thing started, but let me first read a quote from Wired about you just so people get a sense, and in the intro I'll share all of the other epic things you've done. But I think this is a good way to just set context. "Fei-Fei is one of a tiny group of scientists, a group perhaps small enough to fit around a kitchen table, who are responsible for AI's recent remarkable advances."

A lot of people call you the godmother of AI, and unlike a lot of AI leaders, you're an AI optimist. You don't think AI is going to replace us. You don't think it's going to take all our jobs. You don't think it's going to kill us. So I thought it'd be fun to start there, just what's your perspective on how AI is going to impact humanity over time?

Dr. Fei Fei LiYeah, okay, so Lenny, let me be very clear. I'm not a utopian, so it's not like I think AI will have no impact on jobs or people. In fact, I'm a humanist. I believe that whatever AI does, currently or in the future, is up to us. It's up to the people. So I do believe technology is a net positive for humanity. If you look at the long course of civilization, I think we are, and fundamentally, we're an innovative species that we... If you look at from written record thousands of years ago to now, humans just kept innovating ourselves and innovating our tools, and with that, we make lives better, we make work better, we build civilization, and I do believe AI is part of that. So that's where the optimism comes from. But I think every technology is a double-edged sword, and if we're not doing the right thing as a species, as a society, as communities, as individuals, we can screw this up as well.

Lenny RachitskyThere's this line, I think, this was when you were presenting to Congress, "There's nothing artificial about AI. It's inspired by people. It's created by people, and most importantly, it impacts people." I don't have a question there, but what a great line.

Dr. Fei Fei LiYeah, I feel pretty deeply. I started working AI two and a half decades ago, and I've been having students for the past two decades and almost every student who graduates, I remind them when they graduate from my lab that your field is called artificial intelligence, but there's nothing artificial about it.

Lenny RachitskyComing back to the point you just made about how it's kind of up to us about where this all goes, what is it you think we need to get right? How do we set things on a path? I know this is a very difficult question to answer, but just what's your advice? What do you think we should be keeping in mind?

Dr. Fei Fei LiYeah, how many hours do we have?

Lenny RachitskyHow do we align AI? There we go. Let's solve it.

Dr. Fei Fei LiSo I think people should be responsible individuals no matter what we do. This is what we teach our children, and this is what we need to do as grownups as well. No matter which part of the AI development or AI deployment or AI application you are participating in, and most likely many of us, especially as technologists, we're in multiple points. We should act like responsible individuals and care about this. Actually, care a lot about this. I think everybody today should care about AI because it is going to impact your individual life. It is going to impact your community, it's going to impact the society and the future generation. And caring about it as a responsible person is the first, but also the most important step.

Lenny RachitskyOkay, so let me actually take a step back and kind of go to the beginning of AI. Most people started hearing and caring about AI, as what it's called today, just like, I don't know, a few years ago when ChatGPT came out. Maybe it was like three years ago.

Dr. Fei Fei LiThree years ago, almost one more month, three years ago.

Lenny RachitskyWow, okay. And that was ChatGPT coming out. Is that the milestone you have in mind?

Dr. Fei Fei LiYes.

Lenny RachitskyOkay, cool. That's exactly how I saw it. But very few people know there was a long, long history of people working on, it was called machine learning back then and there's other terms, and now it's just everything's AI and there was kind of a long period of just a lot of people working on it. And then there's this what people refer to as the AI winter where people just gave up almost, most people did, and just, okay, this idea isn't going anywhere. And then the work you did actually was essentially the spark that brought us out of AI winter and is directly responsible for the world where now of just AI is all we talk about. As you just said, it's going to impact everything we do. So I thought it'd be really interesting to hear from you just the brief history of what the world was like before ImageNet and just the work you did to create ImageNet, why that was so important, and then just what happened after.

Dr. Fei Fei LiIt is, for me, hard to keep in mind that AI is so new for everybody when I lived my entire professional life in AI. There's a part of me that is just, it's so satisfying to see a personal curiosity that I started barely out of teenagehood and now has become a transformative force of our civilization. It generally is a civilizational level technology. So that journey is about 30 years or 20 something, 20 plus years, and it's just very satisfying. So where did it all start? Well, I'm not even the first generation AI researcher. The first generation really date back to the '50s and '60s, and Alan Turing was ahead of his time in the '40s by asking, daring humanity with the question, "Is there thinking machines?" And of course he has a specific way of testing this concept of thinking machine, which is a conversational chatbot, which to his standard we now have a thinking machine.

But that was just a more anecdotal inspiration. The field really began in the '50s when computer scientists came together and look at how we can use computer programs and algorithms to build these programs that can do things that have been only capable by human cognition. And that was the beginning. And the founding fathers the Dartmouth workshop in the 1956, we have Professor John McCarthy who later came to Stanford who coined the term artificial intelligence. And between the '50s, '60s, '70s, and '80s, it was the early days of AI exploration and we had logic systems, we had expert systems, we also had early exploration of neural network. And then it came to around the late '80s, the '90s, and the very beginning of the 21st century. That stretch about 20 years is actually the beginning of machine learning, is the marriage between computer programming and statistical learning.
And that marriage brought a very, very critical concept into AI, which is that purely rule-based program is not going to account for the vast amount of cognitive capabilities that we imagine computers can do. So we have to use machines to learn the patterns. Once the machines can learn the patterns, it has a hope to do more things. For example, if you give it three cats, the hope is not just for the machines to recognize these three cats. The hope is the machines can recognize the fourth cat, the fifth cat, the sixth cat, and all the other cats. And that's a learning ability that is fundamental to humans and remaining animals. And we, as a field, realized, "We need machine learning." So that was up till the beginning of the 21st century. I entered the field of AI literally in the year of 2000. That's when my PhD began at Caltech.
And so I was one of the first generation machine learning researchers and we were already studying this concept of machine learning, especially neural network. I remember that was one of my first courses at Caltech is called neural network, but it was very painful. It was still smack in the middle of the so-called AI winter, meaning the public didn't look at this too much. There wasn't that much funding, but there was also a lot of ideas flowing around. And I think two things happened to myself that brought my own career so close to the birth of modern AI is that I chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We can talk a little more later, but so much of our intelligence is built upon visual, perceptual, spatial understanding, not just language per se. I think they're complementary.
So I choose to look at visual intelligence and my PhD and my early professor years, my students and I are very committed to a north star problem, which is solving the problem of object recognition because it's a building block for the perceptual world, right? We go around the world interpreting reasoning and interacting with it more or less at the object level. We don't interact with the world at the molecular level. We don't interact with the world as... We sometimes do, but we rarely, for example, if you want to lift a teapot, you don't say, "Okay, the teapot is made of a hundred pieces of porcelain and let me work on this a hundred pieces." You look at this as one object and interact with it. So object is really important. So I was among the first researchers to identify this as a north star problem, but I think what happened is that as a student of AI and a researcher of AI, I was working on all kinds of mathematical models including neural network, including Bayesian network, including many, many models.
And there was one singular pain point is that these models don't have data to be trained on. And as a field, we were so focusing on these models, but it dawned on me that human learning as well as evolution is actually a big data learning process. Humans learn with so much experience constantly. In the evolution, if you look at time, animals evolve with just experiencing the world. So I think my students and I conjectured that a very critically-overlooked ingredient of bringing AI to life is big data. And then we began this ImageNet project in 2006, 2007. We were very ambitious. We want to get the entire internet's image data on objects. Now granted internet was a lot smaller than today, so I felt like that ambition was at least not too crazy. Now, it's totally delusional to think a couple of graduate student and a professor can do this.
And that's what we did. We curated very carefully, 15 million images on the internet, created a taxonomy of 22,000 concepts, borrowing other researchers' work like linguists work on WordNet, and it's a particular way of dictionarying words. And we combine that into ImageNet and we open-sourced that to the research community. We held an annual ImageNet challenge to encourage everybody to participate in this. We continue to do our own research, but 2012 was the moment that many people think was the beginning of the deep learning or birth of modern AI because a group of Toronto researchers led by Professor Geoff Hinton, participated in ImageNet Challenge, used ImageNet big data and two GPUs from NVIDIA and created successfully the first neural network algorithm that can...
It didn't totally solve, but made a huge progress towards solving the problem of object recognition. And that combination of the trio technology, big data, neural network, and GPU was kind of the golden recipe for modern AI. And then fast-forward, the public moment of AI, which is the ChatGPT moment, if you look at the ingredients of what brought ChatGPT to the world technically still use these three ingredients. Now, it's internet-scale data mostly texts is a much more complex neural network architecture than 2012, but it's still neural network and a lot more GPUs, but it's still GPUs. So these three ingredients are still at the core of modern AI.

Lenny RachitskyIncredible. I have never heard that full story before. I love that it was two GPUs was the first. I love that. And now it's, I don't know, hundreds of thousands, right, that are orders of magnitude more powerful.

Dr. Fei Fei LiYep.

Lenny RachitskyAnd those two GPUs where they just bought, they were like gaming GPUs, they just went to the-

Dr. Fei Fei LiYes.

Lenny Rachitsky... GameStar that people use for playing games. As you said, this continues to be in a large way, the way models get smarter. Some of the fastest growing companies in the world right now, I've had them all mostly on the podcast, Mercor and Surge and Scale. They continue to do this for labs, just give them more and more label data of the things they're most excited and interested in.

Dr. Fei Fei LiYeah, I remember Alex Wang from Scale very early days. I probably still has his emails when he was starting Scale. He was very kind. He keeps sending me emails about how image that inspired Scale. I was very pleased to see that.

Lenny RachitskyOne of my other favorite takeaways from what you just shared is just such an example of high agency and just doing things that's kind of a meme on Twitter. Just you can just do things. You're just like, okay, this is probably necessary to move AI. And it's called machine learning back then, right? Was that the term most people used?

Dr. Fei Fei LiI think it was interchangeably. It's true. I do remember the companies, the tech companies, I am not going to name names, but I was in a conversation in one of the early days, I think is in the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. And I remember I was actually encouraging everybody to use the word AI because to me that is one of the most audacious question humanity has ever asked in our quest for science and technology, and I feel very proud of this term. But yes, at the beginning some people were not sure.

Lenny RachitskyWhat year was that roughly when AI was a dirty word?

Dr. Fei Fei Li2016, I think because that was-

Lenny Rachitsky2016, less than 10 years ago.

Dr. Fei Fei LiThat was the changing. Some people start calling it AI, but I think if you look at the Silicon Valley tech companies, if you trace their marketing term, I think 2017-ish was the beginning of companies calling themselves AI companies.

Lenny RachitskyThat's incredible. Just how the world has changed.

Dr. Fei Fei LiYes.

Lenny RachitskyNow, you can't not call yourself an AI company.

Dr. Fei Fei LiI know.

Lenny RachitskyJust nine-ish years later.

Dr. Fei Fei LiYeah.

Lenny RachitskyOh, man. Okay. Is there anything else around the history, that early history that you think people don't know that you think is important before we chat about where you think things are going and the work that you're doing?

Dr. Fei Fei LiI think as all histories, I'm keenly aware that I am recognized for being part of the history, but there are so many heroes and so many researchers. We're talking about generations of researchers. In my own world, there are so many people who have inspired me, which I talked about in my book, but I do feel our culture, especially Silicon Valley, tends to assign achievements to a single person. While I think it has value, but it's just to be remembered. AI is a field of, at this point, 70 years old and we have gone through many generations. Nobody, no one could have gotten here by themselves.

Lenny RachitskyOkay, so let me ask you this question. It feels like we're always on this precipice of AGI, this kind of vague term people throw around, AGI is coming, it's going to take over everything. What's your take on how far you think we might be from AGI? Do you think we're going to get there on the current trajectory we're on? Do you think we need more breakthroughs? Do you think the current approach will get us there?

Dr. Fei Fei LiYeah, this is a very interesting term, Lenny. I don't know if anyone has ever defined AGI. There are many different definitions, including some kind of superpower for machines all the way to machines can become economically viable agent in the society. In other words, making salaries to live. Is that the definition of AGI? As a scientist, I take science very seriously and I enter the field because I was inspired by this audacious question of, can machines think and do things in the way that humans can do? For me, that's always the north star of AI. And from that point of view, I don't know what's the difference between AI and AGI.

I think we've done very well in achieving parts of the goal, including conversational AI, but I don't think we have completely conquered all the goals of AI. And I think our founding fathers, Alan Turing, I wonder if Alan Turing is around today and you ask him to contrast AI versus AGI, he might just shrugged and said, "Well, I asked the same question back in 1940s," so I don't want to get onto a rabbit hole of defining AI versus AGI. I feel AGI is more a marketing term than a scientific term as a scientist than technologist. AI is my north star, is my field's north star, and I'm happy people call it whatever name they want to call it.

Lenny RachitskySo let me ask you maybe this way, like you described, there's kind of these components that from ImageNet and AlexNet took us to where we're today, GPUs essentially, data, label data, just like the algorithm of the model. There's also just the transformer feels like an important step in that trajectory. Do you feel like those are the same components that'll get us to, I don't know, 10 times smarter model, something that's like life-changing for the entire world? Or do you think we need more breakthroughs? I know we're going to talk about world models, which I think is a component of this, but is there anything else that you think is like, oh, this will plateau, or okay, this will take us just need more data, more compute, more GPUs?

Dr. Fei Fei LiOh no, I definitely think we need more innovations. I think scaling loss of more data, more GPUs, and bigger current model architecture is there's still a lot to be done there, but I absolutely think we need to innovate more. There's not a single deeply scientific discipline in human history that has arrived at a place that says we're done, we're done innovating and AI is one of the, if not the youngest discipline in human civilization in terms of science and technology, we're still scratching the surface. For example, like I said, we're going to segue into world models. Today, you take a model and run it through a video of a couple of office rooms and ask the model to count the number of chairs. And this is something a toddler could do or maybe an elementary school kid could do, and AI could not do that, right?

So there's just so much AI today could not do, then let alone thinking about how did someone like Isaac Newton look at the movements of the celestial bodies and derive an equation or a set of equations that governs the movement of all bodies, that level of creativity, extrapolation, abstraction. We have no way of enabling AI to do that today. And then let's look at emotional intelligence. If you look at a student coming to a teacher's office and have a conversation about motivation, passion, what to learn, what's the problem that's really bothering you. That conversation, as powerful as today's conversational bots are, you don't get that level of emotional cognitive intelligence from today's AI. So there's a lot we can do better, and I do not believe we're done innovating.

Lenny RachitskyDemis had this really interesting interview recently from DeepMind slash Google where someone asked him just like, "What do you think, how far are we from AGI? What does it look like going through there?" He had a really interesting way of approaching it is if we were to give the most cutting-edge model all the information until the end of the 20th century, see if it could come up with all the breakthroughs Einstein had and so far we're nowhere near that, but they could just-

Dr. Fei Fei LiNo, we're not. In fact, it's even worse. Let's give AI all the data including modern instruments data of celestial bodies, which Newton did not have, and give it to that and just ask AI to create the 17th century set of equations on the laws of bodily movements. Today's AI cannot do that.

Lenny RachitskyAll right. We're ways away is what I'm hearing.

Dr. Fei Fei LiYeah.

Lenny RachitskyOkay, so let's talk about world models. To me, this is just another really amazing example of you being ahead of where people end up. So you were way ahead on, okay, we just need a lot of clean data for AI and neural networks to learn. You've been talking about this idea of world models for a long time. You started a company to build, essentially there's language models. This is a different thing. This is a world model. We'll talk about what that is. And now, as I was preparing for this Elon's talking about world models, Jensen's talking about world models, I know Google's working on this stuff. You've been at this for a long time and you actually just launched something that's going, we're going to talk about right before this podcast airs. Talk about what is a world model? Why is it so important?

Dr. Fei Fei LiI'm very excited to see that more and more people are talking about world models like Elon, like Jensen. I have been thinking about really how to push AI forward all my life and the large language models that came out of the research world and then OpenAI and all this, for the past few years, were extremely inspiring even for a researcher like me. I remembered when GPT2 came out, and that was in, I think, late 2020. I was co-director, I still am, but I was at that time full-time co-director of Stanford's Human-Centered AI institute, and I remember it was... The public was not aware of the power of the large language model yet, but as researchers, we were seeing it, we're seeing the future, and I had pretty long conversations with my natural language processing colleagues like Percy Liang and Chris Manning. We were talking about how critical this technology is going to be and the Stanford AI Institute, Human-Centered AI Institute, HAI, was the first one to establish a full research center foundation model.

We were, Percy Liang, and many researchers led the first academic paper foundation model. So it was just very inspiring for me. Of course, I come from the world of visual intelligence and I was just thinking there's so much we can push forward beyond language because humans, humans use our sense of spatial intelligence, a world understanding to do so many things and they are beyond language. Think about a very chaotic first responder scene, whether it's fire or some traffic accident or some natural disaster. And if you immerse yourself in those scene and think about how people organize themselves to rescue people, to stop further disasters, to put down fires, a lot of that is movements is spontaneous understanding of objects, worlds, human situational awareness. Language is part of that, but a lot of those situations, language cannot get you to put down the fire.
So that is, what is that? I was thinking a lot. And in the meantime, I was doing a lot of robotics research and it dawned on me that the linchpin of connecting the additional intelligence, in addition to language embodied AI, which are robotics, connecting visual intelligence, is the sense of spatial intelligence about understanding the world. And that's when I think it was 2024, I gave a TED talk about spatial intelligence at world models. And I start formulating this idea back in 2022 based on my robotics and computer vision research. And then one thing that was really clear to me is that I really want to work with the brightest technologists and move as fast as possible to bring this technology to life. And that's when we founded this company called World Labs. And you can see the word world is in the title of our company because we believe so much in world modeling and spatial intelligence.

Lenny RachitskyPeople are so used to just chatbots and that's a large language model. A simple way to understand a world model is you basically describe a scene and it generates an infinitely explorable world. We'll link to the thing you launched, which we'll talk about, but just is that a simple way to understand it?

Dr. Fei Fei LiThat's part of it, Lenny. I think a simple way to understand a world model is that this model can allow anyone to create any worlds in their mind's eye by prompting whether it's an image or a sentence. And also be able to interact in this world whether you are browsing and walking or picking objects up or changing things as well as to reason within this world, for example, if the person consuming, if the agent consuming this output of the world model is a robot, it should be able to plan its path and help to tidy the kitchen, for example. So world model is a foundation that you can use to reason, to interact, and to create worlds.

Lenny RachitskyGreat. Yeah. So robots feels like that's potentially the next big focus for AI researchers and just the impact on the world. And what you're saying here is this is a key missing piece of making robots actually work in the real world, understanding how the world works.

Dr. Fei Fei LiYeah. Well, first of all, I do think there's more than robots. That's exciting. But I agree with everything you just said. I think world modeling and spatial intelligence is a key missing piece of embodied AI. I also think let's not underestimate that humans are embodied agents and humans can be augmented by AI's intelligence. Just like today, humans are language animals, but we're very much augmented by AI helping us to do language tasks including software engineering. I think that we shouldn't underestimate or maybe we tend not to talk about how humans, as an embodied agents, can actually benefit so much from world models and spatial intelligence models as well as robots can.

Lenny RachitskySo the big unlocks here, robots, which a huge deal if this works out, imagine each of us has robots doing a bunch of stuff for us, they help us with disasters, things like that. Games obviously is a really cool example, just like infinitely playable games that you just invent out of your head. And then creativity feels like just like being fun, having fun, being creative, thinking of magic, wild new worlds, and environments.

Dr. Fei Fei LiAnd also design, humans design from machines to buildings to homes and also scientific discovery. There is so much. I like to use the example of the discovery of the structure of DNA. If you look at one of the most important piece in DNA's discovery history is the x-ray diffraction photo that was captured by Rosalind Franklin, and it was a flat 2D photo of a structure that it looks like a cross with diffractions. You can google those photos. But with that 2D flat photo, the humans, especially two important humans, James Watson and Francis Crick, in addition to their other information, was able to reason in 3D space and deduce a highly three-dimensional double helix structure of the DNA. And that structure cannot possibly be 2D. You cannot think in 2D and deduce that structure. You have to think in 3D spatial, use the human spatial intelligence. So I think even in scientific discovery, spatial intelligence or AI-assisted spatial intelligence is critical.

Lenny RachitskyThis is such an example of, I think it was Chris Dixon that had this line that the next big thing is going to start off feeling like a toy. When ChatGPT just came out, I remember Sam Altman just tweeted it as like, "Here's a cool thing we're playing with, check it out." Now, it's the fastest growing product to all of history, changed the world. And it's oftentimes the things that just look like, okay, this is cool, that it's a fun to play with that end up changing the world most.

It's a more secure and branded experience. Plus you get features like interactive carousels and suggested replies. And here's why this matters, US carriers are starting to adopt RCS. Sinch is already helping major brands send RCS messages around the world and they're helping Lenny's podcast listeners get registered first before the rush hits the US market. Learn more and get started at sinch.com/lenny. That's S-I-N-C-H.com/lenny.
I reached out to Ben Horowitz, who loves what you're doing, a big fan of yours. They're investors I believe in...

Dr. Fei Fei LiYeah, we've known each other for many years, but yes, right now they're investors of World Labs.

Lenny RachitskyAmazing. Okay, so I asked him what I should ask you about and he suggested ask you why is the bitter lesson alone not likely to work for robots? So first of all, just explain what the bitter lesson was in the history of AI and then just why that won't get us to where we want to be with robots.

Dr. Fei Fei LiWell, first of all, there are many bitter lessons, but the bitter lessons everybody refers to is a paper written by Richard Sutton who won the Turing Award recently, and he does a lot of reinforcement learning. And Richard has said, if you look at the history, especially the algorithmic development of AI, it turns out simpler model with a ton of data always win at the end of the day instead of the more complex model with less data. I mean, that was actually... This paper came years after ImageNet. That to me was not bitter; it was a sweet lesson. That's why I built ImageNet because I believe that big data plays that role. So why can't bitter lesson work in robotics alone? Well, first of all, I think we need to give credit to where we are today. Robotics is very much in the early days of experimentation.

The research is not nearly as mature as say language models. So many people are still experimenting with different algorithms and some of those algorithms are driven by big data. So I do think big data will continue to play a role in robotics, but what is hard for robotics, there are a couple of things. One is that it's harder to get data. It's a lot harder to get data. You can say, well, there's web data. This is where the latest robotics research is using web videos. And I think web videos do play a role. But if you think about what made language model worth a very... As someone who does computer vision and spatial intelligence and robotics, I'm very jealous of my colleagues in language because they had this perfect setup where their training data are in words, eventually tokens, and then they produce a model that outputs words.
So you have this perfect alignment between what you hope to get, which we call objective function and what your training data looks like. But robotics is different. Even spatial intelligence is different. You hope to get actions out of robots, but your training data lacks actions in 3D worlds, and that's what robots have to do, right? Actions in 3D worlds. So you have to find different ways to fit a, what do they call, a square in a round hole, that what we have is tons of web videos. So then we have to start talking about adding supplementing data such as teleoperation data or synthetic data so that the robots are trained with this hypothesis of bitter lesson, which is large amount of data. I think there's still hope because even what we are doing in world modeling will really unlock a lot of this information for robots.
But I think we have to be careful because we're at the early days of this and bitter lesson is still to be tested because we haven't fully figured out the data for. Another part of the bitter lesson of robotics I think we should be so realistic about is again, compared to language models or even spatial models, robots are physical systems. So robots are closer to self-driving cars than a large language model. And that's very important to recognize. That means that in order for robots to work, we not only need brains, we also need the physical body. We also need application scenarios. If you look at the history of self-driving car, my colleague Sebastian Thrun took Stanford's car to win the first DARPA challenge in 2006 or 2005. It's 20 years since that prototype of a self-driving car being able to drive 130 miles in the Nevada desert to today's Waymo and on the street of San Francisco.
And we're not even done yet. There's still a lot. So that's a 20-year journey. And self-driving cars are much simpler robots, they're just metal boxes running on 2D surfaces, and the goal is not to touch anything. Robot is 3D things running in 3D world, and the goal is to touch things. So the journey is going to be, there's many aspects, elements, and of course one could say, well, the self-driving car, early algorithm were pre deep learning era. So deep learning is accelerating the brains. And I think that's true. That's why I'm in robotics, that's why I'm in spatial intelligence and I'm excited by it. But in the meantime, the car industry is very mature and productizing also involves the mature use cases, supply chains, the hardware. So I think it's a very interesting time to work in these problems. But it's true, Ben is right. We might still be subject to a number of bitter lessons.

Lenny RachitskyDoing this work, do you ever just feel awe for the way the brain works and is able to do all of this for us? Just the complexity just to get a machine to just walk around and not hit things and fall, does just give you more respect for what we've already got?

Dr. Fei Fei LiTotally. We operate on about 20 watts. That's dimmer than any light bulb in the room I'm in right now. And yet we can do so much. So I think actually the more I work in AI, the more I respect humans.

Lenny RachitskyLet's talk about this product you just launched. It's called Marble, a very cute name. Talk about what this is, why this is important. I've been playing with it, it's incredible. We'll link to it for folks to check it out. What is Marble?

Dr. Fei Fei LiYeah, I'm very excited. So first of all, Marble is one of the first product that World Labs has rolled out. World Labs is a foundation frontier model company. We are founded by four co-founders who have deep technical history. My co-founders, Justin Johnson, Christoph Lassner, and Ben Mildenhall. We all come from the research field of AI, computer graphics, computer vision, and we believe that spatial intelligence and world modeling is as important, if not more, to language models and complementary to language models. So we wanted to seize this opportunity to create deep tech research lab that can connect the dots between frontier models with products. So Marble is an app that's built upon our frontier models. We've spent a year and plus building the world's first generative model that can output genuinely 3D worlds. That's a very, very hard problem.

And it was a very hard process and we have a team of incredible, founding team of incredible technologists from incredible teams. And then around just a month or two ago, we saw the first time that we can just prompt with a sentence and the image and multiple images and create worlds that we can just navigate in. If you put it on Google, which we have an option to let you do that, you can even walk around. Even though we've been building this for quite a while, it was still just awe-inspiring and we wanted to get into the hands of people who need it. And then we know that so many creators, designers, people who are thinking about robotic simulation, people who are thinking about different use cases of navigable interactable, immersive worlds game developers will find this useful. So we developed Marble as a first step. It's again, still very early, but it's the world's first model doing this, and it's the world's first product that allows people to just prompt, we call it prompt to worlds.

Lenny RachitskyWell, I've been playing around with it. It is insane. You could just have a little Shire world where you just infinitely walk around middle earth basically, and there's no one there yet, but it's insane. You just go anywhere. There's dystopian world. I'm just looking at all these examples and my favorite part, actually, I don't know if there's a feature or bug, you can see the dots of the world before it actually renders with all the textures. And I just love like, you get a glimpse into what is going on with this model, basically-

Dr. Fei Fei LiThat is so cool to hear because this is where, as a researcher, I am learning because the dots that lead you into the world was an intentional feature visualization, is not part of the model. The model actually just generates the world. But we were trying to find a way to guide people into the world, and a number of engineers worked on different versions, but we converged on the dot, and so many people, you're not the only one, told us how delightful that experience is, and it was really satisfying for us to hear that this intentional visualization feature that's not just the big hardcore model actually has delighted our users.

Lenny RachitskyWow. So you add that to make it more, like to have humans understand what's going on-

Dr. Fei Fei LiTo have fun, yes.

Lenny Rachitsky... get more delightful. Wow, that is hilarious. It makes me think about LLMs and the way they, it's not the same thing, but they talk about what they're thinking and what they're doing.

Dr. Fei Fei LiYes, it is. It is.

Lenny RachitskyIt also makes me think about just the Matrix. It's exactly the Matrix experience. I don't know if that was your inspiration.

Dr. Fei Fei LiWell, like I said, a number of engineers worked on that. It could be their inspiration.

Lenny RachitskyIt's in their subconscious. Okay, so just for folks that may want to play around with this, maybe like, what are some applications today that folks can start using today? What's your goal with this launch?

Dr. Fei Fei LiYeah, so we do believe that world modeling is very horizontal, but we're already seeing some really exciting use cases, virtual production for movies, because what they need are 3D worlds that they can align with the camera. So when the actors are acting on it, they can position the camera and shoot the segments really well. And we're already seeing incredible use. In fact, I don't know if you have seen our launch video showing Marble. It was produced by a virtual production company. We collaborated with Sony and they use Marble scenes to shoot those videos. So we were collaborating with those technical artists and directors, and they were saying, this has cut our production time by 40X. In fact, it has to-

Lenny Rachitsky40X?

Dr. Fei Fei LiYes, in fact it has to, because we only had one month to work on this project and there were so many things they were trying to shoot. So using Marble really, really significantly accelerated the virtual production for VFX and movies. That's one use cases. We are already seeing our users taking our Marble scene and taking the mesh export and putting games, whether it's games on VR or just fun games that they have developed. We are showing an example of robotic simulation because when I was, I mean I still am a researcher doing robotic training. One of the biggest pain point is to create synthetic data for training robots. And this synthetic data needs to be very diverse. They need to come from different environments with different objects to manipulate. And one path to it is to ask computers to simulate.

Otherwise, humans have to build every single asset for robots. That's just going to take a lot longer. So we already have researchers reaching out and wanting to use Marble to create those synthetic environments. We also have unexpected user outreach in terms of how they want to use Marble. For example, a psychologist team called us to use Marble to do psychology research. It turned out some of the psychiatric patients they study, they need to understand how their brain respond to different immersive things of different features. For example, messy scenes or clean scenes or whatever you name it. And it's very hard for researchers to get their hands on these kind of immersive scenes and it will take them too long and too much budget to create. And Marble is a really almost instantaneous way of getting so many of these experimental environments into their hands. So we're seeing multiple use cases at this point. But the VFX, the game developers, the simulation developers as well as designers are very excited.

Lenny RachitskyThis is very much the way things work in AI. I've had other AI leaders on the podcast and it's always put things out there early as soon as you can to discover where the big use cases are. The head of ChatGPT told me how, when they first put out ChatGPT, he was just scanning TikTok to see how people were using it and all the things they were talking about, and that's what convinced them where to lean in and help them see how people actually want to use it. I love this last use case for therapy. I'm just imagining heights, people dealing with heights or snakes or spiders, which-

Dr. Fei Fei LiIt's amazing. A friend of mine last night literally called me and talked about his height scare and asked me if Marble should be used. It's amazing you went straight there.

Lenny RachitskyBecause imagining all the exposure therapy stuff, this could be so good for that. That is so cool. Okay, so I should have asked you this before, but I think there's going to be a question of just, how does this differ from things like VO3 and other video generation models? It's pretty clear to me, but I think it might be helpful just to explain how this is different from all the video AI tools people have seen.

Dr. Fei Fei LiWorld Labs' thesis is that spatial intelligence is fundamentally very important, and spatial intelligence is not just about videos. In fact, the world is not passively watching videos passing by. I love, Plato has the allegory of the cave analogy to describe vision. He said that imagine a prisoner tied on his chair, not very humane, but in a cave watching a full life theater in front of him, but the actual life theater that actors are acting is behind his back. It was just lit so that the projection of the action is on a wall of the cave. And then the goal, the task of this prisoner is to figure out what's going on. It's a pretty extreme example, but it really shows, it describes what vision is about, is that to make sense of the 3D world or 4D world out of 2D. So spatial intelligence to me is deeper than only creating that flat 2D world.

Spatial intelligence to me is the ability to create, reason, interact, make sense of deeply spatial world, whether it's 2D or 3D or 4D, including dynamics and all that. So World Lab is focusing on that, and of course the ability to create videos per se could be part of this. And in fact, just a couple of weeks ago, we rolled out the world's first real time demoable, real-time video generation on a single H100 GPU. So part of our technology includes that, but I think Marble is very different because we really want creators, designers, developers to have in their hands a model that can give them worlds with 3D structures so they can use it for their work. And that's why Marble is so different.

Lenny RachitskyThe way I see it is it's a platform for a ton of opportunity to do stuff. As you described, videos are just like, here's a one-off video that's very fun and cool and you could... And that's it. That's it. And you move on.

Dr. Fei Fei LiBy the way, we could in Marble, we could allow people to export in video forms. So you could actually, like you said, you go into a world, so let's say it's a hobbit cave. You can actually, especially as a creator, you have such a specific way of moving the camera in a trajectory in the director's mind, and then you can export that from Marble into a video.

Lenny RachitskyWhat does it take to create something like this? Just how big is the team, how many GPUs you work in? Anything you can share there. I don't know how much of this is private information, but just what does it take to create something like this that you've launched here?

Dr. Fei Fei LiIt takes a lot of brain power. So we just talk about 20 watts per brain. So from that point of view, it's a small number, but it's actually incredible. It's half billion years of evolution to give us those power. We have a team of 30-ish people now, and we are predominantly researchers and research engineers, but we also have designers and product. We actually really believe that we want to create a company that's anchored in the deep tech of spatial intelligence, but we are actually building serious products. So we have this integration of R&D and productization, and of course, we use a ton of GPUs.

Lenny RachitskyThat's the technical thing.

Dr. Fei Fei LiHappy to hear.

Lenny RachitskyWell, congrats on the launch. I know this is a huge milestone. I know this took a ton of work.

Dr. Fei Fei LiThank you.

Lenny RachitskySo I just want to say congrats to you and your team. Let me talk about your founder journey for a moment. So you're a founder of this company. You started how many years ago? A couple of years ago, two, three years ago?

Dr. Fei Fei LiA year ago.

Lenny RachitskyA year ago?

Dr. Fei Fei LiA year plus.

Lenny RachitskyA year? Okay. Wow.

Dr. Fei Fei LiProbably, 18 month, yeah.

Lenny RachitskyOkay. What's something you wish you knew before you started this that you wish you could whisper into the ear of Fei-Fei of 18 months ago?

Dr. Fei Fei LiWell, I continue to wish I know the future of technology. I think actually that's one of our founding advantage is that we see the future earlier in general than most people. But still, man, this is so exciting and so amazing that what's unknown and what's coming, but I know the reason you're asking me this question is not about the future of technology. Furthermore, look, I did not start a company of this scale at 20-year-old. So I started a dry cleaner when I was 19, but that's a little smaller scale.

Lenny RachitskyWe got to talk about that.

Dr. Fei Fei LiAnd then I founded Google Cloud AI and then I founded an institute at Stanford but those are different beasts. I did feel I was a little more prepared as a founder of the grinding journey compared to maybe the 20-year-old founders. But I still, I'm surprised, and it puts me into paranoia sometimes that how intensely competitive AI landscape is from the model, the technology itself, as well as talents. And when I founded the company, we did not have these incredible stories of how much certain talents would cost. So these are things that continue to surprise me and I have to be very alert about.

Lenny RachitskySo the competition you're talking about is the competition for talent, the speed at which just how things are moving.

Dr. Fei Fei LiYeah.

Lenny RachitskyYeah. You mentioned this point that I want to come back to that if you just look over the course of your career, you were at all of the major collections of humans that led to so many of the breakthroughs that are happening today. Obviously, we talk about ImageNet also just SAIL at Stanford is where a lot of the work happened, Google Cloud, which a lot of the breakthroughs happened. What brought you to those places? Like for people looking for how to advance in their career, be at the center of the future, just is there a through line there of just what pulled you from place to place and pulled you into those groups that might be helpful for people to hear?

Dr. Fei Fei LiYeah, this is actually a great question, Lenny, because I do think about it, and obviously we talked about it's curiosity and passion that brought me to AI, that is more a scientific north star, right? I did not care if AI was a thing or not, so that was one part. But how did I end up choosing in the particular places I work in, including starting World Labs, is I think I'm very grateful to myself or maybe to my parents' genes. I'm an intellectually very fearless person, and I have to say when I hire young people, I look for that because I think that's a very important quality if one wants to make a difference, is that when you want to make a difference, you have to accept that you're creating something new or you're diving into something new. People haven't done that. And if you have that self-awareness, you almost have to allow yourself to be fearless and to be courageous.

So when I, for example, came to Stanford, in the world of academia, I was very close to this thing called tenure, which is have the job forever at Princeton. But I chose to come to Stanford because... I love Princeton. It's by alma mater. It's just at that moment there are people who are so amazing at Stanford and the Silicon Valley ecosystem was so amazing that I was okay to take a risk of restarting my tenure clock. Becoming the first female director of SAIL, I was actually relatively speaking a very young faculty at that time, and I wanted to do that because I care about that community. I didn't spend too much time thinking about all the failure cases.
Obviously, I was very lucky that the more senior faculty supported me, but I just wanted to make a difference. And then going to Google was similar. I wanted to work with people like Jeff Dean, Jeff Hinton, and all these incredible demists, the incredible people. The same with World Labs. I have this passion. And I also believe that people with the same mission can do incredible things. So that's how it guided my through line. I don't overthink of all possible things that can go wrong because that's too many.

Lenny RachitskyI feel like an important element of this is not focusing on the downside, focusing more on the people, the mission. What gets you excited, what do you think, the curiosity.

Dr. Fei Fei LiYeah. I do want to say one thing to all the young talents in AI, the engineers, the researchers out there, because some of you apply to World Labs, I feel very privileged you considered World Labs. I do find many of the young people today think about every single aspect of an equation when they decide on jobs. At some point, maybe that's the way they want to do it, but sometimes I do want to encourage young people to focus on what's important because I find myself constantly in mentoring mode when I talk to job candidates. Not necessarily recruiting or not recruiting, but just in mentoring mode when I see an incredible young talent who is over-focusing on every minute dimension and aspect of considering a job, when maybe the most important thing is where's your passion? Do you align with the mission? Do you believe and have faith in this team? And just focus on the impact and you can make and the kind of work and team you can work with.

Lenny RachitskyYeah, it's tough. It's tough for people in the AI space. Now there's so much, so much at them, so much new, so much happening, so much FOMO.

Dr. Fei Fei LiThat's true.

Lenny RachitskyI could see the stress. And so I think that advice is really important. Just like what will actually make you feel fulfilled in what you're doing, not just where's the fastest growing company, where's the... Who's going to win? I don't know. I want to make sure I ask you about the work you're doing today at Stanford, at the HCI. I think it's the-

Dr. Fei Fei LiHAI.

Lenny RachitskyHAI, Human-Centered AI Institute. What are you doing there? I know this is a thing you do on the side still.

Dr. Fei Fei LiSo yes, HAI, Human-Centered AI Institute was co-founded by me and a group of faculty like Professor John Etchemendy, Professor James Landay, Professor Chris Manning back in 2018. I was actually finishing my last sabbatical at Google and it was a very, very important decision for me because I could have stayed in industry, but my time at Google taught me one thing is AI is going to be a civilization of technology. And it dawned on me how important this is to humanity to the point that I actually wrote a piece in New York Times, that year 2018, to talk about the need for a guiding framework to develop and to apply AI. And that framework has to be anchored in human benevolence, in human centeredness. And I felt that Stanford, one of the world's top university in the heart of Silicon Valley that gave birth to important companies from NVIDIA to Google, should be a thought leader to create this human-centered AI framework and to actually embody that in our research education and policy and ecosystem work.

So I founded HAI. Fast-forward, after six, seven years, it has become the world's largest AI institute that does human-centered research, education, ecosystem, outreach, and policy impact. It involves hundreds of faculty across all eight schools at Stanford, from medicine to education, to sustainability to business, to engineering, to humanities to law. And we support researchers, especially at the interdisciplinary area from digital economy, to legal studies, to political science, to discovery of new drugs, to new algorithms to that's beyond transformers. We also actually put a very strong focus on policy because when we started HAI, I realized that Silicon Valley did not talk to Washington DC and or Brussels or other parts of the world.
And given how important this technology is, we need to bring everybody on board. So we created multiple programs from congressional bootcamp to AI index report to policy briefing, and we especially participated in policymaking including advocating for a national AI research cloud bill that was passed in the first Trump administration and participating in state level regulatory AI discussions. So there's a lot we did, and I continue to be one of the leaders even though I'm much less involved operationally because I care not only we create this technology, but we use it in the right way.

Lenny RachitskyWow. I was not aware of all that other work you were doing. As you're talking, I was reminded Charlie Munger had this quote, "Take a simple idea and take it very seriously." I feel like you've done that in so many different ways and stayed with it and it's unbelievable the impact that you've had in so many ways over the years. I'm going to skip the lightning round and I'm just looking to ask you one last question. Is there anything else that you wanted to share? Anything else you want to leave listeners with?

Dr. Fei Fei LiI am very excited by AI, Lenny. I want to answer one question that when I travel around the world, everybody asks me is that, if I'm a musician, if I'm a teacher, middle school teacher, if I'm a nurse, if I'm an accountant, if I'm a farmer, do I have a role in AI or is AI just going to take over my life or my work? And I think this is the most important question of AI and I find that in Silicon Valley, we tend not to speak heart-to-heart with people, with people like us and not like us in Silicon Valley, but all of us, we tend to just toss around words like infinite productivity or infinite leisure time or infinite power or whatever. But at the end of the day, AI is about people. And when people ask me that question, it's a resounding yes, everybody has a role in AI.

It depends on what you do and what you want. But no technology should take away human dignity and the human dignity and agency should be at the heart of the development, the deployment, as well as the governance of every technology. So if you are a young artist and your passion is storytelling, embrace AI as a tool. In fact, embrace Marble. I hope it becomes a tool for you because the way you tell your story is unique and the world still needs it. But how you tell your story, how do you use the most incredible tool to tell your story in the most unique way is important. And that voice needs to be heard. If you are a farmer near retirement, AI still matters because you are a citizen. You can participate in your community, you should have a voice in how AI is used, how AI is applied.
You work with people that you can encourage all of you to use AI to make life easier for you. If you are a nurse, I hope you know that at least in my career, I have worked so much in healthcare research because I feel our healthcare workers should be greatly augmented and helped by AI technology, whether it's smart cameras to feed more information or robotic assistance because our nurses are overworked, overfatigued, and as our society ages, we need more help for people to be taken care of. So AI can play that role. So I just want to say that it's so important that even a technologist like me are sincere about that everybody has a role in AI.

Lenny RachitskyWhat a beautiful way to end it. Such a tie back to where we started about how it's up to us and take individual responsibility for what AI will do in our lives. Final question, where can folks find Marble? Where can they go, maybe try to join World Labs if they want to? What's the website? Where do people go?

Dr. Fei Fei LiWell, World Labs website is www.worldlabs.ai and you can find our research progress there. We have technical blogs. You can find Marble, the product there. You can sign in there. You can find our job posts link there. We're in San Francisco. We love to work with the world's best talents.

Lenny RachitskyAmazing. Fei-Fei, thank you so much for being here.

Dr. Fei Fei LiThank you, Lenny.

Lenny RachitskyBye everyone.

Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.

章节 02 / 07

第02节

中文 译稿已完成

Lenny Rachitsky很多人都把你叫作 “AI 教母”。而你当年的那项工作,某种意义上真的把整个领域从 AI 寒冬里拉了出来。

Dr. Fei Fei Li在 2015 年中到 2016 年那段时间,一些科技公司甚至还会刻意回避 “AI” 这个词,因为他们不确定它是不是个负面标签。大概到 2017 年前后,才开始陆续有公司主动把自己叫作 AI 公司。

Lenny Rachitsky我记得你有句很经典的话,好像是在国会作证时说的:“AI 并没有什么是‘人工’的。它受人启发,由人创造,最重要的是,它影响的是人。”

Dr. Fei Fei Li这并不代表我觉得 AI 不会影响工作,或者不会影响人。恰恰相反,我一直相信,不管是现在还是未来,AI 会做成什么样,最终都取决于我们,取决于人。

我总体上相信技术对人类是净正向的,但我也认为每一种技术都是一把双刃剑。如果我们作为社会、作为个体,没有把事情做对,我们同样可能把它搞砸。

Lenny Rachitsky你当时有一个非常关键的突破性判断:机器其实可以像人一样学习,但它缺的是人类小时候通过成长获得的大量数据。

Dr. Fei Fei Li我选择从视觉智能的角度切入人工智能,因为人类本质上是非常依赖视觉的动物。

要让机器理解世界,就得让它尽可能看到大量关于物体的图像信息。但“物体”本身是极其难学的,因为同一个物体出现在图像里时,可能有无穷多种变化。
如果你想让计算机学习成千上万种物体概念,你就必须给它看数以百万计的例子。

Lenny Rachitsky今天的嘉宾是李飞飞博士,很多人都称她为 “AI 教母”。

在我们当下所处的这场 AI 革命里,她既是很多关键突破的推动者,也是这些突破的中心人物之一。她主导创建了 ImageNet。简单说,就是她很早就意识到,AI 要变聪明,需要海量而且标注干净的数据。而这个数据集后来成了推动今天这一路模型构建与扩展范式的关键突破。
她后来担任过 Google Cloud 的首席 AI 科学家,那也是很多早期重要技术突破发生的地方;她还曾担任斯坦福人工智能实验室 SAIL 的主任,许多最顶尖的 AI 研究者都从那里走出来。
她还是斯坦福 Human-Centered AI Institute 的共同创建者,这个机构在推动 AI 走向“以人为中心”的方向上发挥着非常重要的作用。她还当过 Twitter 董事,被《时代》评为 AI 领域最具影响力的 100 人之一,也曾进入联合国顾问委员会。总之,头衔还可以一直念下去。
在这次对谈里,飞飞会带我们快速回顾:AI 这一路是怎么走到今天的。
包括一个非常让人震撼的提醒:就在 9 到 10 年前,如果一家公司把自己叫作 AI 公司,几乎是在主动给品牌判死刑,因为当时根本没多少人相信 AI 真能跑通。
而今天,情况已经彻底反过来了。几乎每家公司都在说自己是 AI 公司。
我们还会聊她怎么看 AI 未来对人类社会的影响、现在这一路技术还会把我们带到多远、她为什么如此执着于 world model,以及 world model 到底是什么。最让人兴奋的是,随着这期播客上线,全球第一个 large world model `Marble` 也同步发布了。任何人都可以去 `marble.worldlabs.ai` 试玩,真的非常夸张,强烈建议你亲自去看。
飞飞的影响力巨大,但她在公众视野里其实一直显得过于低调了,所以我特别开心能请她来,也特别想把她的判断分享给更多人。
也要特别感谢 Ben Horowitz 和 Condoleezza Rice,给这次对话提供了很多很好的提问方向。
如果你喜欢这档播客,别忘了在你常用的播客应用或 YouTube 上订阅和关注。接下来,让我们先听一小段赞助信息,然后欢迎李飞飞博士。
本期节目由 Figma 赞助。下面的产品口播已压缩略过。
飞飞,非常感谢你来,也欢迎来到播客。

Dr. Fei Fei Li很高兴来到这里,Lenny。

Lenny Rachitsky我比你还兴奋,真的特别开心能和你聊。这次我有太多想聊的话题了。你在这场 AI 爆发的中心已经待了太久太久。我们会聊很多关于这段历史的事情,而我觉得其实很多人都并不了解这一切是怎么开始的。

不过在此之前,我想先读一段《Wired》对你的评价,先帮大家建立一点背景。完整介绍我会在开场里再补。
他们写道:“李飞飞属于那种极少数的科学家。人数可能少到围坐一张餐桌就够了。而正是这群人,推动了 AI 近些年惊人的飞跃。”
很多人都叫你 “AI 教母”。而且和不少 AI 领域领袖不同,你是个 AI 乐观主义者。你不觉得 AI 会取代我们,不觉得它会拿走所有工作,也不觉得它会毁灭人类。
所以我想从这里开始。你怎么看 AI 长期来看会怎样影响人类?

Dr. Fei Fei Li好,Lenny,我先把话说清楚:我不是一个乌托邦主义者。

我并不认为 AI 不会影响工作,也不认为它不会影响人。事实上,我更愿意把自己定义为一个 humanist,也就是“以人为本”的人。
我相信,无论是当下还是未来,AI 最终会做成什么样,都取决于我们,取决于人。
所以我总体上相信,技术对人类是净正向的。如果你把时间拉长,去看整个人类文明的轨迹,我们本质上就是一个不断创新的物种。几千年来,从最早的文字记录到今天,人类一直在改造自己,也一直在改造工具。通过这种过程,我们让生活变得更好,让工作变得更好,建造出文明,而我认为 AI 正是这个长期过程的一部分。
所以我的乐观来自这里。
但与此同时,我也一直认为,每一项技术都是双刃剑。如果我们作为一个物种、作为社会、作为社群、作为个体,没有把事情做对,我们同样也可能把它搞砸。

Lenny Rachitsky我记得你有句特别好的话,好像是在国会作证时说的:“AI 并没有什么是‘人工’的。它受人启发,由人创造,最重要的是,它影响的是人。”我没有具体问题,就是单纯觉得这句话太好了。

Dr. Fei Fei Li这确实是我的真心话。

我做 AI 已经有 25 年了,也带学生带了将近 20 年。几乎每一个从我实验室毕业的学生,在离开之前我都会提醒他们一句:你的领域叫 artificial intelligence,但它其实一点也不“人工”。

Lenny Rachitsky回到你刚才那个观点,既然最终方向还是取决于我们,那你觉得我们到底要把哪些事情做对?怎样才能把整个系统引到更好的路径上?我知道这题很大,但你会怎么回答?

Dr. Fei Fei Li我们有几个小时可以聊?

Lenny Rachitsky那就顺便把 “how do we align AI” 一起解决了吧。

Dr. Fei Fei Li我觉得,不管做什么,人都应该先成为负责任的个体。

这是我们教给孩子的事,也是成年人该遵守的事。无论你参与的是 AI 开发、AI 部署,还是 AI 应用,甚至大多数技术人其实会同时站在多个位置上,你都应该像一个有责任感的人那样去行动,而且真正认真对待这件事。
说得更直接一点:今天每个人都应该关心 AI,因为它会影响你的个人生活,会影响你的社区,会影响整个社会,也会影响下一代。
把 AI 当成一件与你切身相关的事情,并以一个负责任的人的姿态去面对它,我觉得这是第一步,也是最重要的一步。

Lenny Rachitsky那我们往后退一步,回到 AI 的起点。

大多数人真正开始听说、开始在意今天这个意义上的 AI,大概也就是最近几年,基本就是从 ChatGPT 出来之后开始的。也就三年前左右吧。

Dr. Fei Fei Li三年前。再过一个月,就整整三年了。

Lenny Rachitsky对,就是 ChatGPT 出来那个时点。你脑子里也会把它当成那个公共转折点吗?

Dr. Fei Fei Li是的。

Lenny Rachitsky那和我感受完全一样。

但其实很少有人知道,在这之前有一段非常非常长的历史。那时候大家更多叫它 machine learning,也有很多别的名字。后来才变成现在这样,什么都叫 AI。
在那之前,有很长时间一大群人一直在做这件事,然后又经历了大家所谓的 AI winter,也就是很多人几乎都放弃了,觉得这条路走不通。
而你做的工作,某种程度上就是把整个领域重新点燃的那个火种,也直接导致了今天这个“AI 成了所有人都在谈论的话题、将影响我们所做的一切”的世界。
所以我特别想请你从第一视角讲讲:在 ImageNet 出现之前,那个世界到底是什么样?你为什么会去做 ImageNet?它为什么那么重要?然后,这之后又发生了什么?

English No English text found
No English transcript text was found for this chapter.
章节 03 / 07

第03节

中文 译稿已完成

Dr. Fei Fei Li对我来说,有时候真的很难记住:原来 AI 对大多数人来说还是这么新的东西。因为我的整个职业生涯几乎都活在 AI 里。

所以我心里有一部分会觉得特别满足。一个我在接近十几岁尾声时开始的个人兴趣,今天居然真的成长成了足以改变整个人类文明的力量。它确实是一种文明级技术。
如果从我自己的视角来看,这是一段差不多 20 多年、接近 30 年的旅程,而这条路径本身就很让人满足。
那一切是从哪开始的?
其实我都还不算第一代 AI 研究者。第一代真正可以追溯到上世纪 50、60 年代。更早一点,40 年代的 Alan Turing 就已经提出了那个远远超前于时代的问题:“机器能思考吗?”
当然,他当时还给出了一个很具体的检验方式,也就是通过对话去测试一台机器是否具备“思考能力”。按那个标准来看,今天我们某种意义上已经拥有了能思考的机器。
不过那更多还是一种启发性的起点。真正意义上的学科建立,是从 50 年代开始的。一批计算机科学家聚在一起,开始认真思考:我们能不能通过程序和算法,构造出那些原本只有人类认知能力才能完成的任务?
这就是 AI 学科的开端。
1956 年达特茅斯会议上,后来到斯坦福任教的 John McCarthy 教授提出了 `artificial intelligence` 这个词。
之后从 50 年代、60 年代、70 年代到 80 年代,整个领域都还处在 AI 早期探索阶段。那时有逻辑系统、专家系统,也有神经网络的早期尝试。
再往后,到 80 年代末、90 年代以及 21 世纪初,大概有 20 年左右的时间,我们开始进入 machine learning 时代。它本质上是计算机程序与统计学习的结合。
这次结合给 AI 带来了一个极其关键的认识:纯粹靠规则编写的程序,不足以覆盖我们想象中计算机应该拥有的大量认知能力。
所以我们必须让机器自己去学习模式。只有当机器能学会模式,它才有希望做更多事。
比如,如果你给机器看三只猫,我们希望的不只是它认出这三只猫,而是它能认出第四只、第五只、第六只,乃至所有别的猫。
这种学习能力,本来就是人类以及其他动物非常基础的能力。于是整个领域意识到:我们需要 machine learning。
这就是 21 世纪初之前的大背景。
而我是在 2000 年真正进入 AI 领域的,那一年我在 Caltech 开始读博士。
所以我算是最早那一代 machine learning researcher 之一。那时我们已经在研究 machine learning,尤其是神经网络。
我记得自己在 Caltech 上的第一批课里,就有一门叫 neural network。
但那时候做这些事其实非常痛苦,因为整个领域还处在所谓 AI winter 的正中央。公众并不怎么关注它,资金也不多,但与此同时,各种想法其实很多、也在不断流动。
我后来之所以会和现代 AI 的诞生靠得这么近,我觉得有两件事特别关键。
第一,我选择从 visual intelligence 的角度去看人工智能,因为人类本来就是高度依赖视觉的动物。这个我们后面还可以再展开聊。我们大量的智能,其实都建立在视觉、感知和空间理解之上,而不只是语言。语言和这些能力是互补的。
所以我选择研究视觉智能。
在读博和刚开始做教授的那些年里,我和学生们一直在围绕一个“北极星问题”推进,就是:解决 object recognition,也就是物体识别问题。
因为它是整个感知世界的基础模块。
我们在世界里移动、理解、推理、互动,基本上都是以“物体”为单位来做的。我们不会以分子层级和世界互动。
举个简单例子,如果你想端起一个茶壶,你不会想:“这个茶壶由一百块瓷片组成,我要一块一块来处理。”你会直接把它看作一个整体物体,再与之交互。
所以 object 这个概念极其重要。
我算是较早一批把它明确当成“北极星问题”的研究者之一。但当时真正发生的事情是:作为 AI 的学生和研究者,我当然也在做各种数学模型,神经网络、贝叶斯网络以及很多很多别的模型都做。
可这里面一直有一个单点痛感:这些模型根本没有足够的数据可训练。
整个领域当时的注意力都高度集中在模型本身,可我越来越意识到,人类的学习过程,以及更长时间尺度上的进化,本质上其实都是一个“大数据学习过程”。
人类是在持续不断的经验中学习的。动物的进化,如果你把时间拉长来看,本质上也是在不断经历世界。
所以我和学生们当时有一个判断:真正让 AI 活起来、却被严重低估的关键成分,其实是 big data。
于是我们在 2006 到 2007 年左右启动了 ImageNet 项目。
我们的目标非常激进:想把整个互联网里和物体相关的图像数据都收集进来。
当然,那时的互联网规模比今天小很多,所以这个目标在当时听起来还不至于完全疯掉。放到今天,如果说一位教授加几个博士生想干这事,那简直就像妄想。
但我们当时真的这么做了。
我们非常仔细地整理出 1500 万张来自互联网的图片,建立了一个包含 22000 个概念的 taxonomy。这里面也借用了很多其他研究者的成果,比如语言学界的 `WordNet`,它本质上是一种词汇组织方式。
我们把这些东西整合起来,做成了 ImageNet,并把它开源给整个研究社区。我们还每年举办 ImageNet Challenge,鼓励更多研究者参与。
我们自己当然也继续做研究,但 2012 年,很多人会把那一年视为 deep learning 真正起飞、也就是现代 AI 诞生的时刻。
那一年,多伦多的一组研究者在 Geoffrey Hinton 教授带领下参加了 ImageNet Challenge。他们使用了 ImageNet 这套大数据,再加上 NVIDIA 的两张 GPU,第一次成功训练出一个神经网络算法,在物体识别问题上取得了巨大的突破。
它当然还没有“彻底解决”物体识别,但已经是一次巨大飞跃。
而这三样东西的组合,也就是:大数据、神经网络、GPU,后来就成了现代 AI 的黄金配方。
如果你再快进到 ChatGPT 这个 AI 进入公众视野的时刻,你会发现,技术底层依然还是这三样东西。
只不过今天的数据已经从当年的图像大数据变成了互联网规模、以文本为主的数据;模型结构也比 2012 年复杂得多,但依然是神经网络;算力则从当年的两张 GPU 变成了海量 GPU,但本质仍然是 GPU。
所以直到今天,这三样东西依然是现代 AI 的核心。

Lenny Rachitsky太不可思议了。我以前从没完整听过这段故事。我太喜欢“最开始只是两张 GPU”这个细节了。现在已经变成成千上万张,而且每一张都比当年强很多个数量级。

Dr. Fei Fei Li对。

Lenny Rachitsky而且那两张 GPU 还是他们自己买的,就是普通打游戏用的显卡,对吧?

Dr. Fei Fei Li是的。

Lenny Rachitsky就真的是去买游戏显卡回来跑。

而且你刚才说的这套路径,到今天依然是模型变聪明的重要方式。现在世界上增长最快的一批公司,像 Mercor、Surge、Scale,我几乎都请上过播客。他们还在继续为这些实验室做同一件事:不断提供更多、更高质量、模型最需要的标注数据。

Dr. Fei Fei Li对。我还记得 Alex Wang 在非常早期的时候给我发过邮件。我可能现在都还留着那些邮件。那时他刚开始做 Scale,一直很客气地和我说,ImageNet 对 Scale 的启发非常大。我看到这些当然会很高兴。

Lenny Rachitsky你刚才这段故事里,另一个我特别喜欢的点,是那种很强的 agency。就是“你其实可以直接去做这件事”。虽然现在 Twitter 上大家常拿这句话开玩笑,但你当时真的就是这样:你觉得如果 AI 要继续往前,就必须解决这个问题,于是就真的去做了。

那时大家更多还是叫它 machine learning,对吧?那是主流叫法吗?

Dr. Fei Fei Li我觉得当时两种说法是交替使用的。

而且这确实很有意思。我还记得,在 2015 年中到 2016 年中那段时间,我曾和一些科技公司的人聊过。当时我就明确感受到,有些公司会避免使用 “AI” 这个词,因为他们不确定这是不是个带负面意味的标签。
我就不点名了。
但我那时反而一直在鼓励大家使用 “AI” 这个词。因为对我来说,这是人类在科学与技术探索中提出过的最宏大、最有野心的问题之一,我对这个词是有感情的,也觉得它值得骄傲。
只不过在一开始,很多人确实并不确定。

Lenny Rachitsky那大概是哪一年?就是 “AI” 还是个脏词的时候。

Dr. Fei Fei Li我觉得差不多是 2016 年。

Lenny Rachitsky2016 年。也就是不到 10 年前。

Dr. Fei Fei Li对,那正好是一个转折期。有些人已经开始用 AI 这个词了。但如果你去看硅谷科技公司的 marketing 语言,我觉得大概要到 2017 年前后,才开始有更多公司真的把自己定义成 AI 公司。

Lenny Rachitsky这变化也太大了。

Dr. Fei Fei Li是的。

Lenny Rachitsky现在你反而几乎没法不把自己叫作 AI 公司了。

Dr. Fei Fei Li我知道。

Lenny Rachitsky短短九年左右而已。

Dr. Fei Fei Li对。

Lenny Rachitsky天啊。

好,在我们转去聊未来、以及你现在在做的事情之前,关于那段早期历史,还有什么是你觉得大众不太知道、但其实很重要的吗?

Dr. Fei Fei Li我觉得任何历史都是这样。

我当然知道,自己因为参与其中而被更多人记住,但这段历史里其实有太多英雄,也有太多研究者。我们讲的是一代又一代人的努力。
就拿我自己的经历来说,也有太多人启发过我,我在书里也写过其中很多人。
但我确实觉得,我们的文化,尤其是硅谷文化,很容易把一项成就归到单个人身上。这样做也许有它的传播价值,但我们还是要记住:AI 是一门已经有 70 年历史的学科,它经历了很多代研究者。没有任何一个人能独自把它带到今天。

Lenny Rachitsky那我换个角度问。

现在大家总是在讨论 AGI,好像我们永远站在“马上就要到了”的门槛上。这个词本身其实也很模糊,大家都在说 AGI 要来了,它要接管一切。
你怎么看?你觉得我们离 AGI 还有多远?现在这条技术路径能走到那里吗?还是说还需要一些新的突破?

Dr. Fei Fei Li这是个非常有意思的词,Lenny。

因为我不确定世界上到底有没有人真正定义清楚过 AGI。它有很多不同版本的定义,有些人说的是一种具备超级能力的机器,有些人说的是机器能在社会中成为经济上可行的 agent,也就是能像人一样“挣工资养活自己”。
这就是 AGI 的定义吗?
作为科学家,我是非常认真对待科学定义的。我进入这个领域,本来就是被那个宏大的问题吸引的:机器能不能像人那样思考、那样行动?
对我来说,这一直就是 AI 的北极星。
从这个角度看,我其实不太确定 AI 和 AGI 到底差在哪。

English No English text found
No English transcript text was found for this chapter.
章节 04 / 07

第04节

中文 译稿已完成

Dr. Fei Fei Li我觉得我们已经在 AI 目标的某些部分上做得很好了,比如对话式 AI;但我并不认为,我们已经把 AI 想解决的所有问题都解决了。

而且说实话,如果 AI 这门学科的“开山祖师”之一 Alan Turing 今天还在,你让他来区分 AI 和 AGI,我怀疑他大概只会耸耸肩说:“我在 1940 年代问的就是同一个问题。”
所以我并不太想掉进“AI 和 AGI 到底该怎么定义”的兔子洞。
作为科学家、也是技术工作者,我更倾向于觉得,AGI 更像一个营销词,而不是一个科学词。
对我来说,AI 才是我的北极星,也是整个领域的北极星。至于别人想怎么叫它,我其实并不介意。

Lenny Rachitsky那我换个方式问。

你刚才讲到,从 ImageNet、AlexNet 一路走到今天,背后有几个关键要素:GPU、数据、标注数据,还有模型本身的算法。后面 transformer 似乎又是这条路径上的另一个重要台阶。
你觉得,接下来如果我们想做出比现在强 10 倍、真正会改变整个世界的模型,依然还是靠这些东西吗?还是说我们需要新的突破?
我知道后面我们会聊 world model,我猜那是其中一部分。但除此之外,你会不会觉得:靠“更多数据、更多算力、更多 GPU”这条路,可能会撞到平台期?

Dr. Fei Fei Li我非常明确地认为,我们还需要更多创新。

当然,继续 scale 更多数据、更多 GPU、把现有模型架构做得更大,这条路上仍然还有很多事可做;但与此同时,我也绝对相信,我们必须继续发明新的东西。
在人类历史上,没有哪一门真正深刻的科学学科,会走到某个节点然后说:“好了,我们已经研究完了,不需要再创新了。”
而 AI 从科学与技术史的角度看,几乎是人类文明里最年轻的学科之一。我们现在其实还只是在刮表层。
比如刚才我们要聊到 world model。今天你给一个模型看几段办公室视频,然后让它数一数房间里有多少把椅子,这种事情一个幼儿,或者最多一个小学生就能做,但 AI 现在往往还做不好。
也就是说,今天的 AI 还有太多太多做不到的事情。
更不用说像牛顿那样,看到天体运动之后,抽象出一整套描述万有引力和物体运动规律的方程组。那种层级的创造力、外推能力、抽象能力,我们今天根本还不知道怎么让 AI 具备。
再看情感智能。
比如一个学生走进老师办公室,聊自己的动力、热情、该学什么、眼下到底被什么问题困住了。就算今天的对话机器人已经很强了,它依然给不了那种层次的情绪理解和认知理解。
所以,我们还有很多地方可以做得更好,我完全不相信我们已经接近“创新结束”的时刻。

Lenny RachitskyDemis 最近有个采访很有意思。有人问他:“你觉得我们离 AGI 还有多远?一路过去会是什么样?”

他的回答方式很有意思。他说,如果我们把截至 20 世纪末的所有信息都喂给今天最先进的模型,看它能不能自己推导出像爱因斯坦那样的一系列突破,那答案显然还远远不行。

Dr. Fei Fei Li对,远远不行。甚至更糟。

不如我们更进一步:把包括现代观测仪器采集到的天体数据也一并给 AI,这些数据其实牛顿当年根本没有。然后你再让 AI 反推出 17 世纪那套描述天体运动规律的方程。
今天的 AI 还是做不到。

Lenny Rachitsky所以我听出来的是:还差得挺远。

Dr. Fei Fei Li对。

Lenny Rachitsky好,那我们来聊 world model。

对我来说,这又是一个你总是比别人更早一步看到方向的例子。当年你很早就意识到,AI 和神经网络要想真正学起来,需要大量高质量数据。后来你又很早开始讲 world model 这个方向,甚至还专门创办了一家公司去做它。
语言模型大家都懂了,但这显然是另一种东西。最近我在准备这期播客时,发现 Elon 在讲 world model,Jensen 也在讲,Google 也显然在做。可你已经在这条线上想了很久很久,而且就在这期播客上线前,你还正式发布了新东西。
所以,什么是 world model?它为什么这么重要?

Dr. Fei Fei Li我真的很高兴看到越来越多人开始认真谈论 world model,像 Elon、像 Jensen。

我这一生都在想一件事:怎样把 AI 继续往前推。
过去几年,从学术界一路到 OpenAI,large language model 的崛起对我这样的研究者来说也极具启发。
我还记得 GPT-2 出来时,大概是 2020 年底。那时我在斯坦福 `Human-Centered AI Institute` 做全职共同主任。虽然公众当时还没有真正意识到大语言模型的威力,但我们这些研究者已经看见它的未来了。
我当时和很多做 NLP 的同事,比如 Percy Liang、Chris Manning,都聊过很久。我们很清楚这项技术会非常关键。而斯坦福 HAI 也是最早建立 foundation model 完整研究中心的机构之一。
Percy Liang 和很多研究者后来共同主导了第一篇学术界关于 foundation model 的重要论文。
所以,对我来说,这一切都非常振奋。
但与此同时,我来自 visual intelligence 的世界,我一直在想:除了语言之外,我们其实还有很多东西可以继续往前推。
因为人类会依赖空间智能、对世界的理解去做大量事情,而这些能力并不只是语言能覆盖的。
想象一个非常混乱的现场:火灾、严重交通事故,或者某种自然灾害。你真的把自己放进那个场景里,会发现人类如何组织起来救人、阻止灾情进一步扩大、扑灭火势,其中很多能力都不是语言本身,而是对物体、空间、局势的即时理解,是一种 situational awareness。
语言当然是其中一部分,但在那样的场景里,光靠语言本身,不可能把火扑灭。
于是我开始追问:那这到底是什么能力?
那段时间我也做了很多 robotics research,然后我慢慢意识到,把语言之外的智能、具身智能也就是机器人,以及视觉智能连接起来的那个关键环节,其实就是一种关于“理解世界”的空间智能。
我记得大概在 2024 年,我做过一场 TED Talk,主题就是 spatial intelligence 和 world model。而这套想法其实我在 2022 年左右就已经开始成形了,它是建立在我做机器人和计算机视觉研究的基础上的。
后来我越来越清楚一件事:我想和最优秀的技术人才一起,以尽可能快的速度把这项技术真正做出来。
于是我们创办了 World Labs。你从公司名字就能看出来,`world` 被直接写进了名字里,因为我们对 world modeling 和 spatial intelligence 这件事的信念非常强。

Lenny Rachitsky现在大家已经太习惯 chatbot 了,习惯把“大语言模型”当作 AI 的默认形态。

如果要用一个简单方式理解 world model,是不是可以说:你只要描述一个场景,它就能生成一个可以无限探索的世界?我们后面会放你们新产品的链接,但这样理解算不算基本准确?

Dr. Fei Fei Li这只是其中一部分,Lenny。

如果要用最简单的话来讲,world model 是这样一种模型:不管你给它的是一句话、一张图,还是别的提示,它都能帮助人把脑海中的世界真正生成出来。
而且你不仅能“看到”这个世界,还能在其中互动。你可以浏览、走动、拿起东西、改变其中的元素;你还可以在这个世界里做推理。
比如,如果消费这个 world model 输出的是一个机器人,那它应该能在这个世界里规划路径,甚至知道怎么去整理厨房。
所以 world model 本质上是一种基础能力,它让你能够创建世界、与世界互动、在世界里推理。

Lenny Rachitsky明白了。所以机器人看起来会是这个方向上一个特别大的落地场景。你这里的意思是:world model 正是让机器人真正理解现实世界、从而能在现实里工作的那块关键缺失拼图。

Dr. Fei Fei Li对,但我先补一句:让我兴奋的并不只有机器人。

不过你刚才那段总结,我基本都同意。world modeling 和 spatial intelligence,确实是 embodied AI,也就是具身智能里非常关键的一块缺失拼图。
但我也觉得,别低估“人类自己”作为 embodied agent 的重要性。人类本身就是具身智能体,而 AI 的能力也完全可以增强人类。
就像今天,人类是语言型动物,而 AI 已经在很多语言任务上增强了我们,比如软件工程。未来在世界模型和空间智能上,人类自己也会像机器人一样受益很多。

Lenny Rachitsky所以这里的大机会包括:机器人。如果这条路走通,那会非常巨大。想象一下,每个人都有机器人帮忙做很多事,甚至在灾难现场提供协助。

另外还有游戏,这显然也是一个很酷的场景。你脑子里想到什么世界,就能不断生成、不断探索。
再有就是创造力本身。单纯为了玩、为了创作、为了构思一些神奇而疯狂的新世界、新环境,这件事本身就已经很让人兴奋了。

Dr. Fei Fei Li还有设计。人类会设计机器、建筑、房屋;也包括科学发现。这里面其实还有太多可能性。

我特别喜欢拿 DNA 结构被发现这件事举例。
你去看 DNA 发现史里最关键的一部分,会发现 Rosalind Franklin 当年拍到那张著名的 X 射线衍射图,它本质上只是一张二维平面图,看起来像一个带衍射纹的十字。你去 Google 一下就知道那张图长什么样。
但就是凭这样一张二维图片,人类,尤其是 Watson 和 Crick,再结合其他信息,却能在三维空间里完成推理,最终推导出一个高度三维的 DNA 双螺旋结构。
那个结构不可能靠二维思维推出来。你必须用三维空间思考,也就是依赖人类的 spatial intelligence。
所以哪怕放到科学发现里,空间智能,或者 AI 辅助的空间智能,也一样是关键能力。

Lenny Rachitsky这又特别像 Chris Dixon 说过的一句话:下一个改变世界的大东西,刚开始看起来往往像个玩具。

ChatGPT 刚出来时,我记得 Sam Altman 只是很轻描淡写地发了一条推,说“这是我们在玩的一个小东西,大家试试看”。结果后来它成了人类历史上增长最快的产品之一,也真的改变了世界。
很多时候,最开始看上去只是“挺好玩”“挺有趣”的东西,最后反而最可能改变世界。

English No English text found
No English transcript text was found for this chapter.
章节 05 / 07

第05节

中文 译稿已完成

(Sinch 赞助口播已压缩略过)

Lenny Rachitsky我专门去找了 Ben Horowitz,他非常欣赏你做的事情,也是你的忠实支持者。他们现在也是你们的投资人,对吧?

Dr. Fei Fei Li对,我们认识很多年了。现在他们确实也是 World Labs 的投资人。

Lenny Rachitsky太好了。我问他该让我问你什么,他给我的建议是:问问你,为什么只靠 `the bitter lesson`,并不足以把机器人带到我们真正想去的地方。

那你先解释一下,AI 历史里大家所说的 `bitter lesson` 到底是什么;然后再讲讲,为什么它单独用在机器人上是不够的。

Dr. Fei Fei Li首先,所谓 bitter lesson 其实不止一种说法,但大家通常引用的,是 Richard Sutton 那篇非常有名的文章。Richard Sutton 最近也拿了图灵奖,他主要做 reinforcement learning。

他那篇文章的核心意思大概是:回看 AI 尤其是算法发展的历史,你会发现,到了最后,往往都是“更简单的模型 + 海量数据”赢,而不是“更复杂的模型 + 更少数据”赢。
不过说实话,那篇文章是在 ImageNet 很多年后写出来的。对我来说,那不是 bitter lesson,反而更像 sweet lesson。因为我之所以去做 ImageNet,本来就是相信 big data 会扮演那个关键角色。
那为什么 bitter lesson 不能单独解决机器人问题?
首先,我觉得我们也得承认一点:机器人今天还处在非常早期的实验阶段。
这个领域的成熟度,远远比不上语言模型。现在大家还在尝试各种不同算法,其中有一些当然也是大数据驱动的。所以我也完全相信,大数据会继续在机器人里发挥作用。
但机器人真正难的地方有几个。
第一,就是数据比语言难拿得多,是真的难得多。
你当然可以说,网上也有数据,尤其现在很多机器人研究也在用 web video。我同意,视频数据确实会起作用。
但如果你看语言模型为什么这么“顺手”,作为做计算机视觉、空间智能和机器人的人,我其实很羡慕做语言的同事,因为他们有一个几乎完美的训练设定:训练数据本身就是词,后来是 token,而模型输出的也还是词。
也就是说,你想得到的目标和你的训练数据天然是对齐的。我们把这叫 objective function 和训练数据之间的对齐。
但机器人不一样,空间智能也不一样。
你希望机器人最终输出的是 action,也就是动作;可你的训练数据里,往往并没有足够丰富的“3D 世界里的动作”,而偏偏那正是机器人必须去做的事。
所以你就会遇到一种典型的 “square peg in a round hole” 问题:手里最多的是 web video,可真正要学的是在三维世界里行动。
因此我们就不得不考虑补更多别的数据,比如 teleoperation data,也就是遥操作数据,或者 synthetic data,让机器人尽可能在“大量数据”这个 bitter lesson 假设下去学。
我依然觉得这条路有希望,因为包括我们做的 world modeling,其实就能为机器人释放出很多原本缺失的信息。
但我觉得这里一定要谨慎,因为整个领域还很早,bitter lesson 在机器人里到底能不能完全成立,其实还没有被真正验证完。因为最根本的问题之一,就是我们还没有把“该给机器人什么数据”这件事彻底搞明白。
另外,机器人还有一个特别现实的地方:和语言模型,甚至纯空间模型相比,机器人是 physical system,也就是物理系统。
所以它更像自动驾驶,而不是大语言模型。这点非常重要。
这意味着,要让机器人真正可用,我们需要的不只是“大脑”,还要有“身体”,还要有落地场景。
你去看自动驾驶的发展史就知道了。我的同事 Sebastian Thrun 带着斯坦福的车在 2005 或 2006 年赢得第一届 DARPA 挑战赛,从那台可以在内华达沙漠跑 130 英里的原型车,到今天 Waymo 跑在旧金山街头,已经 20 年了。
而且这件事到今天都还没完全结束,还有很多工作要做。
所以自动驾驶本身就是一段 20 年的旅程。而自动驾驶车其实已经算“更简单的机器人”了,它只是一台金属盒子,在二维路面上移动,而且目标是尽量别碰到任何东西。
真正的机器人却是在三维世界里活动,而且目标恰恰是去接触东西、操作东西。
所以这条路必然会复杂得多,涉及的要素也更多。
当然,你也可以说,自动驾驶最早的算法还处在 pre-deep-learning 时代,所以 deep learning 现在确实在加速“大脑”这一部分。我完全同意,这也是为什么我会持续做机器人、做空间智能,并且对此非常兴奋。
但同时别忘了,汽车产业本身已经是一个非常成熟的产业。产品化不只是算法,还包括成熟的使用场景、供应链、硬件体系。
所以我觉得,现在正是做这些问题的一个非常有意思的时点。但 Ben 说得也没错:在机器人这条路上,我们大概还会遇到不止一次 bitter lesson。

Lenny Rachitsky你做这些研究时,会不会越来越对人脑本身感到敬畏?只是为了让机器能走路、不撞东西、不摔倒,就已经这么复杂了。会不会反而让你更尊重我们原本已经拥有的这套系统?

Dr. Fei Fei Li完全会。

我们的大脑功耗大概只有 20 瓦,比我现在这个房间里任何一个灯泡都更暗,但它却能做这么多事情。
所以我其实越做 AI,就越尊重人类。

Lenny Rachitsky那我们聊聊你刚发布的这个产品,名字叫 `Marble`,还挺可爱的。

你讲讲它到底是什么,为什么重要。我自己已经玩过了,真的很夸张。我们也会把链接放出来,方便大家去试。那 Marble 到底是什么?

Dr. Fei Fei Li我非常兴奋。

首先,Marble 是 World Labs 推出的第一批产品之一。World Labs 是一家做 foundation frontier model 的公司,我们有四位联合创始人,大家都有很深的技术背景。我的几位 co-founder,Justin Johnson、Christoph Lassner 和 Ben Mildenhall,都来自 AI、计算机图形学和计算机视觉研究领域。
我们共同相信,spatial intelligence 和 world modeling 至少和语言模型一样重要,甚至可能更重要,而且它们和语言模型是互补关系。
所以我们想抓住这个机会,建立一个真正把 frontier model 和产品连起来的 deep tech research lab。
Marble 就是建立在我们前沿模型之上的一个应用。过去一年多,我们一直在做一件非常难的事情:构建世界上第一个能够真正输出三维世界的生成模型。
这件事非常难,整个过程也非常难。我们有一支非常强的 founding team,大家都来自非常厉害的团队。
大概一两个月前,我们第一次真正看到:只要输入一句话、一张图,甚至多张图,我们就能生成一个可以导航进入的世界。如果你在产品里打开对应选项,甚至还能直接在里面走动。
即便我们已经做这件事很久了,但第一次看到它真的跑起来时,依然非常震撼。
所以我们很想尽快把它交到真正需要它的人手里。我们知道,会有很多创作者、设计师、做机器人仿真的人、在思考各种可导航、可交互、沉浸式世界用例的人,以及游戏开发者,都会觉得这东西有价值。
所以 Marble 是我们迈出的第一步。它当然还很早,但它已经是世界上第一个能做这件事的模型,也是第一个允许人们真正“prompt to worlds”的产品。

Lenny Rachitsky我自己已经玩了,真的太离谱了。你可以随手做一个小小的夏尔世界,然后几乎可以无限在“中土”里走来走去。虽然里面还没有角色,但光是能去任何地方这件事就已经很疯狂了。

我还看了很多别的例子,比如反乌托邦世界之类。
而且我最喜欢的一点是,我甚至不知道这是 feature 还是 bug:在世界真正渲染出完整纹理前,你会先看到一堆点云。我特别喜欢那个瞬间,就好像你在偷看模型内部到底发生了什么。

Dr. Fei Fei Li听你这么说太好了,因为这正好是我作为研究者特别有意思的一次学习。

那个“点点把你引进世界”的效果,其实是我们故意设计的可视化 feature,不是模型本体的一部分。模型真正做的,只是生成世界。
但我们当时一直在想:怎么把用户自然地引导进那个世界里?于是有几位工程师做了不同版本,最后我们收敛到了这个 dot 方案。
而且很多人都跟你一样,说这个体验特别有趣、特别让人愉悦。所以对我们来说也很开心:这个并不属于底层 hardcore model 的可视化设计,居然真的打动了用户。

Lenny Rachitsky所以你们加这个,是为了让人更容易理解发生了什么?

Dr. Fei Fei Li是,也为了更好玩。

Lenny Rachitsky太好笑了。这让我想到 LLM 现在那些“展示自己在想什么”的方式。虽然不是一回事,但感觉有点像。

Dr. Fei Fei Li对,是有点那个意思。

Lenny Rachitsky也让我想到《黑客帝国》,那个感觉几乎一模一样。我不知道这是不是你们的灵感来源。

Dr. Fei Fei Li就像我说的,那部分是不同工程师一起做出来的,说不定真的是他们潜意识里的灵感。

Lenny Rachitsky好,那对于想上手试一试的人来说,今天已经有哪些实际场景能用起来?你们这次发布最核心的目标是什么?

Dr. Fei Fei Li我们确实相信 world modeling 是一个非常横向的平台能力。

但到现在为止,我们已经看到了几个特别令人兴奋的 use case。
比如电影的 virtual production,因为这类场景需要的是可以和摄像机对齐的 3D 世界。这样演员在里面表演时,导演和摄影团队才能更精准地摆机位、拍镜头。
我们已经看到一些非常惊人的应用了。其实你如果看过我们发布 Marble 的 launch video,就会知道那支视频本身就是由一家做 virtual production 的公司制作的。我们和 Sony 也有合作,他们就直接拿 Marble 生成的场景去拍那些视频。
所以我们和那些 technical artist、导演一起协作时,他们直接告诉我们:这套东西让他们的制作效率提升了 40 倍。

Lenny Rachitsky40 倍?

English No English text found
No English transcript text was found for this chapter.
章节 06 / 07

第06节

中文 译稿已完成

Dr. Fei Fei Li对,事实上必须得有这么大提升。因为我们当时做这个项目只有一个月时间,但又有很多镜头要拍。所以 Marble 对 VFX 和电影的虚拟制作,确实起到了非常显著的加速作用。这是一个很明确的 use case。

我们现在还看到,已经有用户把 Marble 生成的场景导出来,拿 mesh export 去做游戏,不管是 VR 游戏,还是他们自己做的一些有趣的小型游戏。
我们也展示了一个机器人仿真的例子。因为我自己一直都在做机器人训练研究,而其中一个最大痛点,就是给机器人训练制造 synthetic data,也就是合成数据。
这些合成数据必须非常多样,得来自不同环境、包含不同可操作对象。其中一条路径,就是让计算机自己去模拟这些环境。
否则的话,就得靠人一项一项地给机器人手工搭建所有 asset,这会花非常非常多时间。
所以现在已经有研究者主动来找我们,希望用 Marble 生成这类 synthetic environment。
我们也收到了很多意料之外的用法反馈。比如有一个做心理学研究的团队联系了我们,他们想用 Marble 来做心理学实验。
原来他们研究的一些精神科患者,需要进入不同特征的沉浸式场景里,研究大脑对这些场景的反应。比如特别凌乱的场景、特别整洁的场景,或者各种别的环境。
而对研究者来说,自己去获得这些 immersive scene 非常困难,不仅时间太长,预算也太高。Marble 几乎能以接近即时的方式,把大量实验环境交到他们手里。
所以现在我们已经看到很多不同方向的 use case。不过目前最兴奋的几类用户,还是 VFX 团队、游戏开发者、仿真开发者,以及设计师。

Lenny Rachitsky这其实很像 AI 产品一贯的工作方式。之前我也请过很多 AI 公司的负责人,他们几乎都会说:东西要尽早放出去,越早越好,这样你才能尽快看清真正的大用例到底在哪。

ChatGPT 的负责人就跟我讲过,他们刚把 ChatGPT 放出来的时候,团队里的人就在疯狂刷 TikTok,看大家到底在怎么用它、在讨论什么。那反而帮他们看清了之后该往哪里加码,用户真正想怎么用这个产品。
我特别喜欢你刚才说的最后一个用法,就是心理治疗。比如恐高的人,或者怕蛇、怕蜘蛛的人,感觉它在这类 exposure therapy 里会特别有潜力。

Dr. Fei Fei Li太有意思了。昨晚就有个朋友给我打电话,专门跟我聊他对高处的恐惧,还问我 Marble 能不能用在这件事上。你居然第一反应就想到这里,太神了。

Lenny Rachitsky因为一想到 exposure therapy,我就觉得这太适合了,真的很酷。

对了,这个问题我本来应该更早问:Marble 和像 VO3 这样的 video generation model,到底有什么区别?
对我来说差异挺明显的,但我觉得对很多人来说,还是值得你用更清楚的方式解释一下。毕竟大家已经看过很多视频生成工具了。

Dr. Fei Fei LiWorld Labs 的核心判断是:spatial intelligence 是一件基础性的能力,而空间智能并不只是“做视频”。

事实上,我们并不是在被动地看一段视频从眼前流过去。柏拉图有一个很经典的“洞穴寓言”,我很喜欢拿它来解释视觉。
他描述的是:想象一个囚徒被绑在椅子上,坐在山洞里,看着面前的一整面墙。真正发生的戏剧表演其实在他身后,只是光把那些动作投影到了洞壁上,而这个囚徒要做的事,就是根据投影判断到底发生了什么。
这个例子虽然极端,但它很好地说明了视觉在干什么:我们其实是在从二维投影里,推断三维,甚至四维世界。
所以对我来说,spatial intelligence 的深度要远远超过“生成一个平面的二维画面”。
空间智能在我这里的定义,是能够生成、推理、互动,并理解一个真正具有空间结构的世界。这个世界可以是二维、三维,甚至四维的,还包括其中的动态变化。
所以 World Labs 真正聚焦的是这件事。生成视频当然也可以是其中一部分。
事实上,就在几周前,我们刚刚发布了世界上第一个可实时演示、而且只需要单张 H100 GPU 就能运行的实时视频生成能力。所以这也在我们的技术栈里。
但 Marble 之所以很不一样,是因为我们真正想交到创作者、设计师、开发者手里的,不只是“会出视频的模型”,而是一个能直接给他们 3D 结构化世界,让他们拿去工作的模型。这就是 Marble 的独特之处。

Lenny Rachitsky我自己的理解是,它更像一个“可供各种应用搭建的平台”。

而视频模型更像是:你做出一个很酷的一次性视频,然后就结束了,继续往下一个走。

Dr. Fei Fei Li顺便说一句,其实 Marble 里也完全可以导出视频。

比如你进入一个世界,假设是个霍比特洞穴。作为创作者,你脑子里会有一个非常明确的镜头运动轨迹,像导演一样知道摄像机该怎么走。你完全可以把那条路径在 Marble 里走出来,再导出成视频。

Lenny Rachitsky那做出这样一个东西,到底需要什么?团队有多大?用了多少 GPU?如果有能说的,你可以稍微讲讲。这个级别的产品,到底是怎样被做出来的?

Dr. Fei Fei Li它需要大量脑力。

刚才我们说过,一个脑子差不多 20 瓦,从数字上看好像不多,但其实非常惊人。那是 5 亿年进化才给我们的能力。
我们现在团队大概有 30 多人,主体还是 researchers 和 research engineers,但同时也有 designers 和 product。
我们非常相信,要做一家真正以 spatial intelligence 这种 deep tech 为根基的公司,但与此同时,我们也要认真地把产品做出来。
所以我们的组织方式,本质上就是把 R&D 和 productization 紧密地绑在一起。
当然,我们也用了非常多 GPU。

Lenny Rachitsky这就是我想听的技术部分。

Dr. Fei Fei Li很高兴你这么说。

Lenny Rachitsky总之,恭喜你们发布。我知道这一定是一个很大的 milestone,也知道这里面花了非常多的工夫。

Dr. Fei Fei Li谢谢。

Lenny Rachitsky所以我也想借这个机会恭喜你和你的团队。

接下来我想聊聊你的 founder journey。你作为这家公司创始人,大概是什么时候开始做的?两三年前?

Dr. Fei Fei Li一年多前。

Lenny Rachitsky一年多前?

Dr. Fei Fei Li对,一年多一点。

Lenny Rachitsky才一年?哇。

Dr. Fei Fei Li严格说大概 18 个月左右。

Lenny Rachitsky那如果让你回到 18 个月前,你最想对当时的李飞飞耳边悄悄说一句什么?

Dr. Fei Fei Li我大概还是会希望自己能提前知道技术的未来。

其实我觉得,这本来就是我们这类创始人的优势之一:我们通常比大多数人更早一点看到未来。
但即便如此,未来里未知的东西、即将到来的东西,还是让人觉得既兴奋又震撼。
不过我知道,你问的并不只是技术未来。
另外要说明一点,我不是在 20 岁时开始做这种规模的公司。虽然我 19 岁时确实开过一家干洗店,但那个规模就小多了。

Lenny Rachitsky这个故事我们得聊。

Dr. Fei Fei Li后来我创建过 Google Cloud AI,也在斯坦福创办过研究机构,但它们都和“创业公司”不是同一种动物。

所以相较于很多 20 岁就开始创业的人,我确实觉得自己在面对那种高强度、持续 grinding 的 founder journey 时,会更有一点准备。
但即便如此,我还是会被一件事不断震到,有时甚至会让我进入某种 paranoia 状态:那就是 AI 这个领域竞争到底有多激烈。
这种竞争既发生在模型和技术层面,也发生在人才层面。
而且当我们创办公司时,市场上还没有今天这种关于“某些顶尖人才到底能贵到什么程度”的夸张故事。
所以这些东西直到现在仍然让我感到意外,也逼着我必须始终保持非常高的警觉。

Lenny Rachitsky所以你说的竞争,既包括人才竞争,也包括整个技术变化的速度本身,对吧?

English No English text found
No English transcript text was found for this chapter.
章节 07 / 07

第07节

中文 译稿已完成

Dr. Fei Fei Li对,我确实还想对所有年轻的 AI 人才说一句话,不管你是工程师还是研究者。因为你们当中有些人也申请了 World Labs,所以我其实一直觉得很荣幸,你们会认真考虑我们。

但我也注意到,今天很多年轻人在做职业选择时,会把一切都算得非常细,几乎把方程里的每个变量都要拆开来评估。
当然,这也许是他们想要的决策方式。
但有时候我还是很想鼓励年轻人,把注意力重新拉回更重要的东西上。因为我在和候选人交流时,经常不自觉就进入一种 mentoring mode,不一定是在招或不招,而更像是在看见一个很优秀的年轻人时,忍不住想提醒他:你是不是把太多注意力放在那些非常细小的维度上了?
也许真正更重要的问题是:你的热情到底在哪里?你和这个 mission 对齐吗?你信不信这支团队?你能不能在这里做出真正有影响力的事?以及,你能不能和这群人一起做你真正想做的工作?

Lenny Rachitsky对,尤其对 AI 领域的人来说,这真的很难。现在扑面而来的信息太多了,新东西太多了,变化太快了,FOMO 也太强了。

Dr. Fei Fei Li确实如此。

Lenny Rachitsky我完全能理解那种压力。所以我觉得你这个建议非常重要。真正该想的,也许不是“哪家公司增长最快”“谁会赢”,而是“什么事情会让我真的有成就感、有满足感”。

另外我一定想问问你现在在斯坦福做的事。是 HCI 吗?

Dr. Fei Fei Li是 HAI。

Lenny Rachitsky对,HAI,Human-Centered AI Institute。你现在在那边具体还在做什么?我知道这还是你一直在持续投入的事情。

Dr. Fei Fei Li对,HAI,也就是 Human-Centered AI Institute,是我和 John Etchemendy、James Landay、Chris Manning 等一批老师在 2018 年共同创立的。

那时我刚结束在 Google 的最后一个学术休假期。对我来说,那其实是一个非常重要的人生决定,因为我本来完全可以继续留在工业界。
但在 Google 的那段经历让我越来越清楚一件事:AI 会是一种文明级技术。
我也因此越来越强烈地意识到,这件事对整个人类有多重要。2018 年我甚至还专门在《纽约时报》写过一篇文章,谈为什么我们需要一套指导 AI 开发与应用的框架。而这套框架必须扎根在人类善意和 human-centeredness,也就是“以人为中心”之上。
我当时觉得,像斯坦福这样一所位于硅谷核心、孕育出从 NVIDIA 到 Google 等一系列关键公司的顶尖大学,应该成为这个 human-centered AI 框架的思想引领者,并且把它真正落实到研究、教育、政策和生态工作里。
所以我创办了 HAI。
快进到今天,六七年过去,它已经成长为全球最大的、以 human-centered 为核心的 AI 研究机构之一,覆盖研究、教育、生态外联和政策影响。
它集合了斯坦福八大学院的数百位 faculty,从医学、教育、可持续发展、商学院、工程、人文,到法学院,全都在其中。
我们支持很多研究者,尤其是跨学科方向上的工作:从 digital economy、法律研究、政治学,到新药发现,再到 transformer 之外的新算法。
我们还非常重视 policy。因为当初创办 HAI 时,我很清楚地看到:硅谷并没有真正和华盛顿、布鲁塞尔,或者世界其他地方形成有效对话。
但既然这项技术如此重要,我们就必须让更多人一起上桌。
所以我们做了很多项目:从 congressional bootcamp、AI Index Report,到 policy briefing。我们也深度参与了 policymaking,包括推动 national AI research cloud 相关法案,它在特朗普第一任期时得到了通过;我们也参与了州一级关于 AI 监管的讨论。
所以这几年我们做了很多事。虽然我现在在运营层面参与得没那么深了,但我依然是这个机构的负责人之一。因为我在意的不只是“把技术造出来”,还在意“我们有没有以正确的方式去使用它”。

Lenny Rachitsky哇,我之前其实并不知道你还做了这么多别的工作。

你刚才说着说着,我突然想到 Charlie Munger 那句很经典的话:“Take a simple idea and take it very seriously.”
我觉得你这些年就是在以各种不同方式做这件事,而且一直坚持了下来。你带来的影响真的很不可思议。
我准备直接跳过 lightning round,只问最后一个问题。你还有没有什么特别想留给听众的话?

Dr. Fei Fei Li我对 AI 真的充满兴奋,Lenny。

而且我特别想回答一个问题。这个问题是我到世界各地演讲时,几乎每个人都会问我的:
如果我是音乐家,如果我是中学老师,如果我是护士、会计、农民,我在 AI 时代还有没有角色?还是说 AI 会直接接管我的生活和工作?
我觉得,这是 AI 时代最重要的问题。
而我也发现,在硅谷,我们很少真正和人做那种 heart-to-heart 的对话。不管是和像我们这样的人,还是和不像我们这样的人。我们更习惯随手抛出一些词,比如无限生产力、无限闲暇、无限力量之类的。
但说到底,AI 最终还是关乎人。
所以当人们这样问我时,我的答案非常明确:有,而且每个人都有自己的位置。
当然,这会取决于你是谁、你想要什么。
但没有任何一种技术,应该夺走人的尊严。人的尊严与能动性,必须成为每一项技术在开发、部署和治理中的核心。
所以,如果你是一个年轻的艺术家,你热爱讲故事,那就把 AI 当成工具去拥抱它。事实上,也欢迎你去拥抱 Marble。我真心希望它能成为你的工具。因为你讲故事的方式是独一无二的,这个世界依然需要它。
但关键在于:你如何讲故事?你如何使用这样强大的工具,以你自己独特的方式去讲故事?你的声音仍然应该被听见。
如果你是一位接近退休的农民,AI 依然和你有关,因为你首先是公民。你可以参与自己的社区,也应该对 AI 如何被使用、如何被应用拥有发声权。
你也可以和身边的人一起,鼓励大家用 AI 让生活变得更轻松一些。
如果你是一名护士,我特别希望你知道:至少在我自己的职业生涯里,我在 healthcare research 上投入了非常多精力,因为我一直觉得,我们的医护工作者理应被 AI 更好地增强和支持。
不管是通过更智能的摄像系统,让他们获得更多信息;还是通过机器人辅助,因为护士真的长期处于过劳和过度疲惫的状态。而且随着社会老龄化,我们只会越来越需要更多帮助,才能把人照顾好。
所以 AI 完全可以在这里发挥作用。
我只是想认真地说一句:哪怕像我这样一个技术工作者,我也是真心相信,每个人在 AI 时代都有自己的角色。

Lenny Rachitsky这是一个特别美的收尾,也和我们一开始说的东西首尾呼应了:最终它会走向哪里,取决于我们,也取决于每个人愿不愿意承担自己的责任。

最后一个收尾问题:大家去哪里能找到 Marble?如果有人想申请加入 World Labs,又该去哪里?网站是什么?

Dr. Fei Fei LiWorld Labs 的网站是 [www.worldlabs.ai](https://www.worldlabs.ai)。

你可以在那里看到我们的研究进展、技术博客,也能找到 Marble 这个产品本身,可以直接注册登录。我们的招聘入口也在上面。
我们在旧金山,非常欢迎全球最优秀的人才加入我们。

Lenny Rachitsky太好了。飞飞,非常感谢你今天来。

Dr. Fei Fei Li谢谢你,Lenny。

Lenny Rachitsky大家拜拜。

感谢收听。如果你觉得这期内容有价值,欢迎在 Apple Podcasts、Spotify 或你常用的播客应用里订阅,也欢迎给我们打分或留言,这会帮助更多人发现这档节目。
想看往期节目或了解更多信息,可以去 `lennyspodcast.com`。我们下期见。

English No English text found
No English transcript text was found for this chapter.