Dwarkesh
👤 PersonAppearances Over Time
Podcast Appearances
And it thought a bunch.
And then it realized that the correct way to model the information you gain from a single yes or no outcome in RL is as the entropy of a random binary variable.
It made a graph which showed how the bits you gain for a sample in RL versus supervised learning scale as the pass rate increases.
And as soon as I saw the graph that Gemini 3 made, immediately a ton of things started making sense to me.
Then I wanted to see if there was any empirical basis to this theory.
So I asked Gemini to code an experiment to show whether the improvement in loss scales in this way with pass rate.
I just took the code that Gemini outputted, I copy pasted it into a Google Colab notebook, and I was able to run this toy ML experiment and visualize its results without a single bug.
It's interesting because the results look similar but not identical to what we should have expected.
And so I downloaded this chart and I put it into Gemini and asked it, what is going on here?
And it came up with a hypothesis that I think is actually correct, which is that we're capping how much supervised learning can improve in the beginning by having a fixed learning rate.
And in fact, we should decrease the learning rate over time.
It actually gives us an intuitive understanding for why in practice we have learning rate schedulers that decrease the learning rate over time.
I did this entire flow from coming up with this vague initial question to building a theoretical understanding to running some toy ML experiments, all with Gemini 3.
This feels like the first model where it can actually come up with new connections that I wouldn't have anticipated.
It's actually now become the default place I go to when I want to brainstorm new ways to think about a problem.
If you want to read more about RL scaling, you can check out the blog post that I wrote with a little help from Gemini 3.
And if you want to check out Gemini 3 yourself, go to gemini.google.
I am curious, if you say we are back in an era of research, you were there from 2012 to 2020.
And do you have... Yeah, what is now the vibe going to be if we go back to the era of research?
For example...