Dwarkesh
π€ SpeakerAppearances Over Time
Podcast Appearances
I respect its work.
It's not perfect yet.
I think it's actually better at the style on a word to word sentence to sentence level than it is at planning out a blog post.
I think
So I think there are possibly two reasons for it.
One, we don't know how the base model would have done at this task.
We know that all the models we see are to some degree reinforcement learning into a kind of corporate speak mode.
You can get it somewhat out of that corporate speak mode, but I don't know to what degree this is actually doing its best to imitate Scott Alexander versus hit some average between Scott Alexander and corporate speak.
That's right.
And I don't think anyone knows except the internal employees who have access to the base model.
And the second thing, I think of, maybe just because it's trendy, as an agency or horizon failure.
Like, deep research is an okay researcher.
It's not a great researcher.
If you actually want to understand an issue in depth, you can't use deep research.
You've got to do it on your own.
So if you think, like, I spend maybe...
five to ten hours researching a really research-heavy blog post.
The meter thing, I know we're not supposed to use it for any task except coding, but like it says, on average, the AI's horizon is one hour.
So I'm guessing it just cannot plan and execute a good blog post.
It does something very superficial rather than actually going through the steps.