Trenton Bricken
๐ค SpeakerAppearances Over Time
Podcast Appearances
Right.
But like just through the number of samples they gave in the context.
And you can see if you plot on the x-axis like number of shots that it has or examples and then like the loss it gets on just like ordinary least squares regression.
Yeah.
That will go down with time.
Yeah.
The NeurIPS best paper this year by Ryland Schaefer was the lead author, points to this as like the emergence of mirage where people will have a task and you get the right or wrong answer depending on if you've sampled the last five tokens correctly.
And so naturally, you're multiplying the probability of sampling all of those.
And if you don't have enough nines for reliability, then you're not going to get emergence.
And all of a sudden, you do.
And it's like, oh my gosh, this ability is emergent, when actually it was kind of almost there to begin with.
I mean, it was less than a year ago that we introduced 100K context windows.
And I think everyone was pretty surprised by that.
So yeah, everyone just kind of had this sound bite of quadratic attention costs.
We can't have long context windows.
And here we are.
So yeah, the benchmarks are being actively made.
Yeah, it's interesting because you do get more forward passes by having more tokens.
Right.
My one gripe, I guess I have two gripes with this, though.