Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
it has many sessions within that outer loop, then like this continual learning where it uses like, it fine tunes itself or it writes to an external memory or something will just sort of like emerge spontaneously.
Do you think, do you think things are things that are plausible?
I just, I don't have really a prior over like how plausible is that?
How likely is that to happen?
Interesting.
In 10 years, do you think it'll still be something like a transformer, but with a much more modified attention and more sparse MLPs and so forth?
It's surprising that all of those things together are...
only halved half of the error, which is like 30 years of progress.
Maybe half is a lot, because if you halve the error, that actually means that... Half is a lot, yeah.
Yeah, actually, I was about to ask a very similar question about NanoChat.
Because since you just coded up recently, every single sort of step in the process of building a chatbot is like fresh in your RAM.
And I'm curious if you had similar thoughts about like, oh, there was no one thing that was relevant to going from...
GPT-2 to NanoChat.
What are sort of like surprising takeaways from the experience?
What is the best way for somebody to learn from it?
Is it just like delete all the code and try to re-implement from scratch, try to add modifications to it?
Yeah, I think that's a great question.
Interesting.
You tweeted out that coding models were actually of very little help to you in assembling this repository.