Ahmed El-Kishky
๐ค SpeakerAppearances Over Time
Podcast Appearances
You'd work it out.
You'd think through, maybe make a mistake and fix it.
And you give an answer.
If you asked a scientist to work on a problem or a mathematician, they do the same thing.
Hard problems require, you know, more time, more thinking.
So we decided to lean into reinforcement learning as a way to get our models to actually, you know, think longer, think better.
And that's actually where the breakthrough came in.
What a breakthrough it was, too.
It was a crazy one.
Like, we wanted to see, like...
We wanted to a little bit mimic how a human, you know, thinks through these problems.
They try things out and maybe they go down wrong directions to dead end.
They course correct.
And OpenAI has always been into like reinforcement learning.
from the days of like, you know, Dota, playing video games, they really leaned into reinforcement learning as a tool that would bring about sort of next level intelligence.
And there'd been some attempts to apply reinforcement learning to LLMs, but nothing at this scale.
So the, I guess, the strawberry efforts, the O-series models, were the first, like, very serious attempt at getting reinforcement learning working on these large language models.
And
It was honestly like amazing to sort of see it from the beginning, like, you know, when it started to now being at a performance level where it's competitive with some of the best competitive programmers.
Yeah.