Alex Imas
π€ SpeakerAppearances Over Time
Podcast Appearances
But thinking about like, what are these associations between embeddings mean?
Like when a model says that it's sad, how should we interpret it as humans in relation to me saying it's sad?
It got sad and it wrote a file about you now.
I don't take them super seriously.
That part, the whole labor market disruption thing, I'm taking very, very seriously.
The whole part about it's trying breaking out and it doesn't want to betray its friends.
It doesn't want to delete its data.
I think that's just cosplay.
Cosplay could be very serious.
I feel like we've seen these sorts of things that you've mentioned with previous models that have since become open weights and open, not open source, but open weights.
And it just seems like once you take them out of the context that they were in for that specific test, they don't really do.
Now, I could be wrong about this particular model and I could be completely wrong about, look, mythos comes out and it's actually everything that these documents are suggesting.
But given previous experience with these sorts of announcements, which we've seen over and over and over again over the years, I'm not super focused on that.
Here's my counterpoint.
Let's look at the specific comparative static of model intelligence and alignment scores.
He predicts negative correlation or maybe flat.
It's positive.
The more, the smarter these models are getting, the more aligned they're becoming.