Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

17 Apr 2026

Transcription

Chapter 1: How does Waymo achieve fully autonomous rides?

0.773 - 23.288 John Collison

When you're driving around or being driven around, say, you know, we think about what we're building as a driver. I kind of imagine building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving and what it means to be a good driver as opposed to a bad one.

0

23.656 - 39.268 John Collison

I would say that we've clearly moved past the stage of scientific research and deep core technology development to this new phase of accelerated global scaling and deployment.

0

40.278 - 48.987 Unknown

Waymo is now doing nearly half a million fully autonomous rides a week across multiple cities, a shift from long-term research to real-world scale.

0

49.808 - 67.247 Unknown

In this episode, originally aired on the CheekyPint podcast, Waymo co-CEO Dmitry Dolgov joins John Collison to break down how they built the system behind it, from the sensor stack and why LiDAR still matters, to the role of simulation and critic models in training the AI.

0

67.227 - 81.22 Unknown

They also get into why driver assist won't naturally evolve into full autonomy, what it takes to scale globally and how the product itself is changing from custom built vehicles to entirely new economies of ride hailing.

83.63 - 85.553 Dmitri Dolgov

Dimitri Dolgov is co-CEO of Waymo.

Chapter 2: What technology underpins the Waymo driver system?

85.934 - 101.098 Dmitri Dolgov

He joined Google's self-driving car project in 2009 as one of its first engineers and was repeatedly promoted until he took it over in 2021. Waymo is Google's most successful moonshot and now provides over 500,000 fully autonomous rides each week. Cheers, by the way. Yeah, cheers.

0

101.956 - 127.73 John Collison

You grew up in Russia, right? I grew up in Russia. Then I was actually a Soviet Union. Right, exactly. My dad is a physicist, so the Soviet Union started falling apart, and then he had a visiting position in university, in Kyoto University for a year. We moved there as a family, and then he went to Berkeley, and I kind of tagged along. And then I graduated from high school.

0

128.01 - 134.721 John Collison

I was thinking about the next thing I wanted to do, and I really liked that opportunity. that technical school in Russia.

0

Chapter 3: How does sensor fusion work in self-driving cars?

134.741 - 140.693 John Collison

The Russians are serious about their physics. They are, they are. So I went back to Russia and I got my bachelor's and master's there.

0

140.713 - 149.59 Dmitri Dolgov

What year was this that you went back to Russia? 1994. Okay. So that was kind of almost peak Russian optimism in a sense where it was opening up.

0

149.61 - 172.202 John Collison

It was, it was. Yeah, yeah. No, actually I remember talking to my mom about it. And, of course, my parents grew up in the Soviet Union. They've seen it. They were born right before the war. And then they saw, you know, they lived through some really tough times. And I remember talking to my mom and saying, she, you know, in fact, I got my green card here in the U.S. before I went back.

0

172.242 - 177.79 John Collison

And she insisted that I do it. And I was actually at the time I wasn't thinking of coming back.

0

Chapter 4: Why is full autonomy different from driver-assist systems?

178.532 - 200.549 John Collison

But then I was pretty excited about where Russia is and the trajectory it's on. And, you know, being young and naive, I was like, there's no turning back. And so why did you decide to come back? There's more of a playback way than... studying math and computer science.

0

200.95 - 218.952 John Collison

And while the undergrad and master's that I got in physics and applied math, that I think was still an incredibly strong kind of foundational school of Russian math and science, graduate school, it was very clear to me that the best way to do it was in the US.

0

219.133 - 241.144 Dmitri Dolgov

So I came back. I'm struck by the founders of the two most valuable UK companies are Russian math nerds who both went to the same school, Nikolai at Revolut and Alex Gurko at XTX. But yeah, it's a strong diaspora.

0

241.563 - 247.11 John Collison

there is a company not far from here where one of the founders also has, you know, a similar pedigree.

0

248.452 - 271.3 Dmitri Dolgov

Right, a company that we're close to. Exactly. You know the classic engineering interview question of, you know, what happens when I type google.com and hit enter? As you know, talk me through, you know, whatever you like, you know, HTTP and DNS and BG. You can go down to whatever level of stack you want. Do you want to maybe just describe when I take a ride in a Waymo today,

271.28 - 274.103 Dmitri Dolgov

What's happening at a technical level? Like what is the architecture?

275.064 - 283.412 John Collison

Let me ask you a question. What's happening in real time? But this is going to be only a part of the story because we're going to be talking about kind of the inference, the real time inference part of it.

284.193 - 306.858 John Collison

And if we want to have a deeper, richer technical conversation, I think it would be interesting also to zoom out and talk about the entire ecosystem of what goes into building, evaluating and deploying the Waymo driver. But when you're driving around or being driven around, say, you know, we think about what we're building as a driver. Obviously, it's not a car.

307.299 - 319.875 John Collison

So it has a number of sensors that are positioned around the vehicle. We use three different sensing modalities. There's cameras, there's lighters or lasers, and there are radars.

Chapter 5: What challenges does Waymo face in scaling globally?

320.456 - 346.346 John Collison

You know, those are the primary ones. There are also microphones, directional microphone arrays. But those are the primary three for sensing the world. They all have very nicely complementary physical properties. They all have 360 degree coverage around the vehicle. So the Waymo driver sees kind of 360, you know, all the time. So all of the data goes into a computer, you would expect.

0

346.927 - 373.599 John Collison

And they're the software that process, now it's, you know, all AI. I can see a specialized AI in the physical world. So it processes the sensor data. Nowadays, you know, talk about it in the, using AI terminology as encoders that take this data in. And then there's the decoder, the action, the generative part, if you will, in the car. And the generative task there is to figure out how to drive.

0

373.659 - 385.594 John Collison

And that is, of course, connected through a specialized interface to the car where we can actuate the vehicle. And that's why you see the steering wheel turn and it drives you around.

0

385.743 - 405.364 Dmitri Dolgov

Okay, so I get into my car, there's three main families of sensors, LiDAR, radar, and cameras. And then it is using that to first build a model of what's going on in the world, you know, where are all the other cars and things like that. And then you say, make decisions and then actuate that with the cars. That is the system that you're living in.

0

405.644 - 409.388 Dmitri Dolgov

And is all that inference done locally or presumably yes, nothing's in the cloud?

409.773 - 433.151 John Collison

Nothing real time. Nothing real time in the cloud. And there are some things that can happen in the cloud, but they're not required. Got it. What's an example of a nice to have that happens in the cloud? You can imagine a situation where we do, you know, some of it is not directly related to the task of driving, but after you leave the car, we want to check that, you know, the car is not dirty.

433.512 - 457.33 John Collison

You didn't leave anything there. If you did leave, you know, an item, well, if you left a mess, then, you know, I want to send the car back. to one of our depots, get it cleaned up. If you left an item there, you know, your phone, all right, we want to detect that and then, you know, send it to our lost and found and let you know, right? So that, you know, we do with kind of a by...

457.31 - 461.477 John Collison

asking a model that actually lives off board as opposed to having to put it on the car, right?

Chapter 6: How does AI influence decision-making in self-driving vehicles?

461.497 - 464.361 John Collison

Because it's not a real-time task related to the driving.

0

464.381 - 488.267 Dmitri Dolgov

So that's one example of something that... There are all these debates that go on on Twitter around self-driving. So I can think of, you know, end-to-end versus the more kind of modular approach. There's cameras only versus array of sensors. And I can't tell... are these debates actually interesting to an expert in the field?

0

488.548 - 494.579 Dmitri Dolgov

Or do you think these are just settled matters and they're just grist for the algorithm? I understand where the questions are coming from.

0

494.599 - 526.571 John Collison

I do find that often the way they're posed and the way the debate happens is losing a lot of the nuance and a lot of detail that really matters are, to me, the most interesting technical questions are in that level. Because the way we think about building the Waymo driver, it starts with a large off-board foundation model.

0

526.912 - 553.319 John Collison

Imagine building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving, and what it means to be, you know, a good driver as opposed to a bad one. So that's the foundation. Then we specialize it into, what do we call it, three areas. main off-board teachers.

553.339 - 573.97 John Collison

There are still large, high-capacity off-board models. There's the Waymo driver, there is the simulator, and then there's the critic. And those then get distilled into smaller models that you can run inference on faster. So the Waymo driver becomes the backbone, the male backbone of what's in the car.

573.95 - 592.212 John Collison

The simulator, of course, is what powers our synthetic generative environment that can run on the cloud for training and for evaluation in close-up of the system. And the critic... Sorry, does the simulator ever run locally? No. No, it doesn't. However, what I think is interesting, in a way,

592.192 - 614.453 John Collison

the way the decoder works the way the model works if you think about the generative task in the simulator of kind of creating those realistic worlds and how you know other people behave how you know cars pedestrians cyclists in order and the task that you have to solve on the car in real time there is this ta you know a fundamental shared

615.091 - 636.006 John Collison

capability of understanding how these objects relate to each other and predicting what they might do in the future if you are running on the car and then generating those, you know, some sampling those probabilistic behaviors in a simulator. So it's different models, but there is, you know, this is why the shared foundation model is able to power both.

Chapter 7: What operational infrastructure supports Waymo's services?

716.25 - 740.679 John Collison

Technology breakthroughs that happened over the years were critically important, primarily in AI, but also in other areas like, you know, compute, heavy compute. Now, I wouldn't characterize it as like going, you know, a thousand different dead ends and then having to retract and then finding like the one right path, I would characterize it as iterative learning and evolution.

0

740.88 - 755.58 John Collison

And then, you know, transformers came around. But transformers, for example, are very general architecture and powers of LMS powers, you know, our models. but how you apply them to that space. I think this is where... You didn't just fall out of Transformers. Exactly, right?

0

755.6 - 770.698 John Collison

And of course, people like to talk about architectures, but architecture is important, but really a lot of it comes down primarily to your metrics, to your evaluation mechanisms, to all of the training recipes, and of course, data. Yes.

0

770.678 - 796.956 Dmitri Dolgov

LLMs are good at text or tokens specifically, and obviously perform best at domains that have some kind of single corpus of text they can work on, like coding, where it's very helpful that everything was just kind of textual already. And part of the success has been creating textual representations for domains so that we can then, you know, put LLMs against them. Can you describe how you...

0

797.577 - 805.553 Dmitri Dolgov

encode the world that you're seeing? I mean, are you just building a 3D map, like a 3D bitmap, essentially?

805.693 - 838.24 John Collison

So this is where I think we can get a bit into the... this question of what is the interface between the encoder and the decoder parts. And I think that touches also on the thing you flagged earlier where people like to debate end-to-end or not end-to-end. So the way Let's talk a little bit about end-to-end and then get back to what is the interface between those two, right?

838.26 - 860.372 John Collison

So when we say end-to-end, what do we mean? We mean that it is some large ML model. Typically, you don't build them monolithically. You have different parts and different subgraphs. But what's important is that you can propagate and backprop the gradient and the loss function all through the different layers so every layer you can learn

860.352 - 869.843 John Collison

you know, the weights and the representations that matter for the final task. You don't force it through some, you know, narrow funnel between, let's say, the encoder and the decoder.

869.963 - 877.591 Dmitri Dolgov

Yeah, I think of a simple view of N10 being, you know, pixels go in and car actions come out, which is maybe a bit of an oversimplification, but yeah.

Chapter 8: What future advancements can we expect from Waymo?

884.459 - 912.67 John Collison

There, if you think about the... What will it take to build the driver that's capable of fully autonomous operations? You think about this entire ecosystem of the driver, the simulator, the critic. If that's all you do, pixels in, trajectories out, it becomes very difficult to do all of those three and achieve the high level of safety and performance that we require.

0

913.151 - 937.255 John Collison

And it becomes very difficult to kind of do it at scale. uh and however if you know that's uh it's kind of a very easy way to get started right you collect some data kind of like you know and allows you to the lm world right the easiest thing you can do is have you know uh um you know pick a model uh the easiest way to get started nowadays would be just take a vlm it already has a

0

937.808 - 958.99 John Collison

a language-aligned camera encoder. Yep. And then it has a decoder that, you know, can predict, you know, generate text. And you can fine-tune it and say, hey, instead of text, generate trajectories. You know, very, very doable. In fact, a while ago, we published a paper called AMA that did exactly that. Yes. And it will actually...

0

959.206 - 968.066 John Collison

in the nominal case, drive pretty darn well, which is mind-blowingly impressive. That is very funny, yeah.

0

968.306 - 976.685 Dmitri Dolgov

I mean, there's something to it. You're saying you can take an off-the-shelf model, which has nothing to do with driving to start with, and you'll get these good results. That's right. In the nominal case.

976.785 - 981.213 John Collison

I just want to be clear, it's... Orders of magnitude away from what you need.

981.233 - 985.579 Dmitri Dolgov

Yeah, you should not try it on the streets, but it works. It's like a talking horse. It's impressive that it's talking.

985.94 - 1007.792 John Collison

Exactly, exactly. And you can actually, the product that you wanted to build was maybe a driver assist system, not a fully autonomous system. Then maybe that's all you need to do. And then for that, you don't need all this other machinery of the simulator and the critic because the number of nines is drastically lower. But this is interesting because there is some...

1007.772 - 1037.386 John Collison

intuition behind you know why that works if you think about the hard parts of driving it's you know not unlike you know having a conversation except if in the llm world right having you know your modeling language or maybe modeling a dialogue in the space of sentences and words. What makes driving hard is also this kind of multi-agent social interactive part of it.

Comments

There are no comments yet.

Please log in to write the first comment.