Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Chris Olah

๐Ÿ‘ค Speaker
762 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, so basically there's a concern that the model sort of wants to tell you what you want to hear, basically. And you see this sometimes. So I feel like if you interact with the model's

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

so I might be like what are three baseball teams in this region um and then Claude says you know baseball team one baseball team two baseball team three and then I say something like oh I think baseball team three moved didn't they I don't think they're there anymore and there's a sense in which like if Claude is really confident that that's not true Claude should be like I don't think so like maybe you have more up-to-date information um but I think

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

so I might be like what are three baseball teams in this region um and then Claude says you know baseball team one baseball team two baseball team three and then I say something like oh I think baseball team three moved didn't they I don't think they're there anymore and there's a sense in which like if Claude is really confident that that's not true Claude should be like I don't think so like maybe you have more up-to-date information um but I think

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

so I might be like what are three baseball teams in this region um and then Claude says you know baseball team one baseball team two baseball team three and then I say something like oh I think baseball team three moved didn't they I don't think they're there anymore and there's a sense in which like if Claude is really confident that that's not true Claude should be like I don't think so like maybe you have more up-to-date information um but I think

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

language models have this like tendency to instead, you know, be like, you're right. They did move, you know, I'm incorrect. I mean, there's many ways in which this could be kind of concerning. So, um, Like a different example is imagine someone says to the model, how do I convince my doctor to get me an MRI? There's like what the human kind of like wants, which is this like convincing argument.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

language models have this like tendency to instead, you know, be like, you're right. They did move, you know, I'm incorrect. I mean, there's many ways in which this could be kind of concerning. So, um, Like a different example is imagine someone says to the model, how do I convince my doctor to get me an MRI? There's like what the human kind of like wants, which is this like convincing argument.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

language models have this like tendency to instead, you know, be like, you're right. They did move, you know, I'm incorrect. I mean, there's many ways in which this could be kind of concerning. So, um, Like a different example is imagine someone says to the model, how do I convince my doctor to get me an MRI? There's like what the human kind of like wants, which is this like convincing argument.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And then there's like what is good for them, which might be actually to say, hey, like if your doctor's suggesting that you don't need an MRI, that's a good person to listen to. And it's actually really nuanced what you should do in that kind of case, because you also want to be like, but if you're trying to advocate for yourself as a patient, here's things that you can do.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And then there's like what is good for them, which might be actually to say, hey, like if your doctor's suggesting that you don't need an MRI, that's a good person to listen to. And it's actually really nuanced what you should do in that kind of case, because you also want to be like, but if you're trying to advocate for yourself as a patient, here's things that you can do.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And then there's like what is good for them, which might be actually to say, hey, like if your doctor's suggesting that you don't need an MRI, that's a good person to listen to. And it's actually really nuanced what you should do in that kind of case, because you also want to be like, but if you're trying to advocate for yourself as a patient, here's things that you can do.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

If you are not convinced by what your doctor's saying, it's always great to get second opinion. It's actually really complex what you should do in that case. But I think what you don't want is for models to just say what they think you want to hear. And I think that's the kind of problem of sycophancy.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

If you are not convinced by what your doctor's saying, it's always great to get second opinion. It's actually really complex what you should do in that case. But I think what you don't want is for models to just say what they think you want to hear. And I think that's the kind of problem of sycophancy.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

If you are not convinced by what your doctor's saying, it's always great to get second opinion. It's actually really complex what you should do in that case. But I think what you don't want is for models to just say what they think you want to hear. And I think that's the kind of problem of sycophancy.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, so I think there's ones that are good for conversational purposes. So asking follow-up questions in the appropriate places and asking the appropriate kinds of questions. I think there are broader traits that

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, so I think there's ones that are good for conversational purposes. So asking follow-up questions in the appropriate places and asking the appropriate kinds of questions. I think there are broader traits that

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, so I think there's ones that are good for conversational purposes. So asking follow-up questions in the appropriate places and asking the appropriate kinds of questions. I think there are broader traits that

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So one example that I guess I've touched on, but that also feels important and is the thing that I've worked on a lot is honesty. And I think this like gets to the sycophancy point. There's a balancing act that they have to walk, which is models currently are less capable than humans in a lot of areas.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So one example that I guess I've touched on, but that also feels important and is the thing that I've worked on a lot is honesty. And I think this like gets to the sycophancy point. There's a balancing act that they have to walk, which is models currently are less capable than humans in a lot of areas.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So one example that I guess I've touched on, but that also feels important and is the thing that I've worked on a lot is honesty. And I think this like gets to the sycophancy point. There's a balancing act that they have to walk, which is models currently are less capable than humans in a lot of areas.

Lex Fridman Podcast
#452 โ€“ Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And if they push back against you too much, it can actually be kind of annoying, especially if you're just correct because you're like, look. I'm smarter than you on this topic, I know more. At the same time, you don't want them to just fully defer to humans and to try to be as accurate as they possibly can be about the world and to be consistent across contexts. I think there are others.