Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Yeah, actually, I was about to ask a very similar question about NanoChat.
Because since you just coded up recently, every single sort of step in the process of building a chatbot is like fresh in your RAM.
And I'm curious if you had similar thoughts about like, oh, there was no one thing that was relevant to going from...
GPT-2 to NanoChat.
What are sort of like surprising takeaways from the experience?
What is the best way for somebody to learn from it?
Is it just like delete all the code and try to re-implement from scratch, try to add modifications to it?
Yeah, I think that's a great question.
Interesting.
You tweeted out that coding models were actually of very little help to you in assembling this repository.
And I'm curious why that was.
And they just couldn't internalize that you had your own?
The reason I think this question is so interesting is because the main story people have about AI exploding and getting to superintelligence pretty rapidly is AI automating, AI engineering, and AI research.
And so they'll look at the fact that you can have Cloud Code make entire application, CRUD applications from scratch and be like, if you had this same capability inside of OpenAI and DeepMind and everything, well, just imagine the level of like just, you know, a thousand of you or a million of you in parallel finding little architectural tweaks.
And so it's quite interesting to hear you say that this is the thing they're sort of asymmetrically worse at.
And it's like quite relevant to forecasting whether the AI 2027 type explosion is likely to happen anytime soon.
I think that's a good way of putting it.
Very naive question, but the architectural tweaks that you're adding to NanoChat, they're in a paper somewhere, right?
They might even be in a repo somewhere.