Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And even if you can get massively greater guarantees of safety using much better alternative approaches.
In the logarithm domain, it's infinitely better.
Sure, sure.
But in the expected value domain.
And I think that's kind of the difference in the two mentalities here.
Do you think current models
internally represent truth.
I guess you're saying one advantage of this model is that it's focused on representing ground truth as a latent variable.
My guess is that current LLMs do that as well because that is very useful to have some sense of what's actually correct.
And then they distort that.
They basically start with that and then they distort it in order to accomplish the goals, including manipulating people or lying or whatever else.
I guess some people doubt that.
Some people doubt whether there is any connection or that they are actually trying to model truth.
Do you have a view?
Yeah, I completely agree with you.
Okay, so as far as I can tell, there's three big approaches here.
One is we're going to use this model as a monitor, as a guardrail.
Another would be we're going to just train it from scratch and make this be the whole approach.
Another would be we could take the current models and try to make them more honest, make them more like a scientist AI.
Do you want to talk at all about whether that approach has any good prospects?