John Collison

And if all you did was kind of observing how other people drive when you trained the system, maybe observing just passively how people drive and how they interact, maybe also driving the car yourself and then using imitative learning to train it.

1083.897 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

Mind that that's not enough.

1101.462 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

You have to do something in closed loop.

1103.829 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

You have to do things like RLFT, which is also, you know, parallel to what we see outside.

1106.877 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

RLFT?

1112.793 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

RLFT.

1113.936 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

Reinforcement Learning Based Finetuning.

1114.919 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

Okay, yes.

1117.493 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

So similar to the reinforcement learning with human feedback in the LLM world, right?

1119.075 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

You want to do maybe closed-loop proper, closed-loop driving where you explore all kinds of different situations and then you give it a reward signal to keep it in distribution.

1125.242 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

For that, then, you need a realistic simulator.

1137.055 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

Right?

1140.713 View full episode →

The a16z Show

From Models to Mobility: Building Waymo with Dmitri Dolgov

You also, you know, if you want to have a good RL system, you need to have an opinion for the reward function.