Alex Imas
๐ค SpeakerAppearances Over Time
Podcast Appearances
I feel like we've seen these sorts of things that you've mentioned with previous models that have since become open weights and open, not open source, but open weights.
And it just seems like once you take them out of the context that they were in for that specific test, they don't really do.
Now, I could be wrong about this particular model and I could be completely wrong about, look, mythos comes out and it's actually everything that these documents are suggesting.
But given previous experience with these sorts of announcements, which we've seen over and over and over again over the years, I'm not super focused on that.
Here's my counterpoint.
Let's look at the specific comparative static of model intelligence and alignment scores.
He predicts negative correlation or maybe flat.
It's positive.
The more, the smarter these models are getting, the more aligned they're becoming.
Now, I'm not saying that there's not going to be a super smart model that decides, hey, I'm actually unaligned.
This is actually a super important point.
If you guys remember Mecca Hitler?
Mecca Hitler was actually super dumb.
But the thing is when you make โ the model โ the reason it's becoming smart is because it's kind of absorbing all of human content to a larger extent than human content has values and ethics as part of it.
If you go in there and lobotomize it in a way that โ
what that model, the reason it started acting like Mecha Hitler is because they were trying to make it less woke, right?
So that's the equivalent of lobotomizing a human being and saying, hey, I'm gonna take that part out of its brain.
Guess what happens to that person?
He gets real dumb.