John Collison
π€ SpeakerAppearances Over Time
Podcast Appearances
Okay, so then let's say we build this thing.
Just cameras, camera encoder, pixels go in, trajectory go out.
The quality is sufficient to, you know, drive in the normal case.
It's not sufficient to deal with the long tail of, you know, the edge cases and hit the high bar of superhuman safety that we require.
So then you start asking the question, what else do you need?
Yes.
And if all you did was kind of observing how other people drive when you trained the system, maybe observing just passively how people drive and how they interact, maybe also driving the car yourself and then using imitative learning to train it.
Mind that that's not enough.
You have to do something in closed loop.
You have to do things like RLFT, which is also, you know, parallel to what we see outside.
RLFT?
RLFT.
Reinforcement Learning Based Finetuning.
Okay, yes.
So similar to the reinforcement learning with human feedback in the LLM world, right?
You want to do maybe closed-loop proper, closed-loop driving where you explore all kinds of different situations and then you give it a reward signal to keep it in distribution.
For that, then, you need a realistic simulator.
Right?
You also, you know, if you want to have a good RL system, you need to have an opinion for the reward function.
This is where the credit comes in, right?