Mike Krieger
👤 PersonAppearances Over Time
Podcast Appearances
That's getting encoded into that around how should it behave? Should it be willing to answer and be willing to chime in on where should it not be. And that's, I think, very linked to the responsible scaling piece. Then next is in the actual system prompt. So we actually, in the spirit of transparency, just started publishing our system prompts.
People would always figure out clever ways to try to reverse them anyway, and we're like, if that's going to happen, why don't we just actually treat it like a changelog? So just be transparent. So it's, I think this is last week, you can go online and you actually see what we've changed.
People would always figure out clever ways to try to reverse them anyway, and we're like, if that's going to happen, why don't we just actually treat it like a changelog? So just be transparent. So it's, I think this is last week, you can go online and you actually see what we've changed.
People would always figure out clever ways to try to reverse them anyway, and we're like, if that's going to happen, why don't we just actually treat it like a changelog? So just be transparent. So it's, I think this is last week, you can go online and you actually see what we've changed.
That's another place where there's like additional kind of guidance that we give to the model around how it should act. Of course, like ideally it gets baked in earlier. People can always find ways to try to get around it, but with We're fairly good at preventing jailbreaks. And then the last piece is where our trust and safety team sits.
That's another place where there's like additional kind of guidance that we give to the model around how it should act. Of course, like ideally it gets baked in earlier. People can always find ways to try to get around it, but with We're fairly good at preventing jailbreaks. And then the last piece is where our trust and safety team sits.
That's another place where there's like additional kind of guidance that we give to the model around how it should act. Of course, like ideally it gets baked in earlier. People can always find ways to try to get around it, but with We're fairly good at preventing jailbreaks. And then the last piece is where our trust and safety team sits.
And that's, you know, the trust and safety team is the closest team. At Instagram, we called it at one point trust and safety, another point well-being, but that same kind of like last mile remediation piece. And I would kind of bucket that work into two pieces. One is what are people doing with Claude and publishing out to the world? So with Artifact's
And that's, you know, the trust and safety team is the closest team. At Instagram, we called it at one point trust and safety, another point well-being, but that same kind of like last mile remediation piece. And I would kind of bucket that work into two pieces. One is what are people doing with Claude and publishing out to the world? So with Artifact's
And that's, you know, the trust and safety team is the closest team. At Instagram, we called it at one point trust and safety, another point well-being, but that same kind of like last mile remediation piece. And I would kind of bucket that work into two pieces. One is what are people doing with Claude and publishing out to the world? So with Artifact's
It was like the first product we had that any amount of social thing at all, which is like you could create an article, in fact, hit share, and actually put that on the web. And... That's a very common problem in shared content. I lived shared content for almost 10 years at Instagram, and here it was like, wait, do people have usernames? How do they get reported?
It was like the first product we had that any amount of social thing at all, which is like you could create an article, in fact, hit share, and actually put that on the web. And... That's a very common problem in shared content. I lived shared content for almost 10 years at Instagram, and here it was like, wait, do people have usernames? How do they get reported?
It was like the first product we had that any amount of social thing at all, which is like you could create an article, in fact, hit share, and actually put that on the web. And... That's a very common problem in shared content. I lived shared content for almost 10 years at Instagram, and here it was like, wait, do people have usernames? How do they get reported?
We delayed that launch for a week and a half to make sure we had the right TNS, trust and safety pieces around moderation, reporting, cues around taking it down, limited distribution, figuring out what it means for the people on Teams plans versus individuals. Some of those things where I got very excited, like, let's ship this, sharing artifacts.
We delayed that launch for a week and a half to make sure we had the right TNS, trust and safety pieces around moderation, reporting, cues around taking it down, limited distribution, figuring out what it means for the people on Teams plans versus individuals. Some of those things where I got very excited, like, let's ship this, sharing artifacts.
We delayed that launch for a week and a half to make sure we had the right TNS, trust and safety pieces around moderation, reporting, cues around taking it down, limited distribution, figuring out what it means for the people on Teams plans versus individuals. Some of those things where I got very excited, like, let's ship this, sharing artifacts.
a week later, we're like, okay, now we can ship it. We've got to actually sort these things out. So that's on the content moderation side, I would say. And then on the response side as well, we also have additional pieces that sit there that are either around preventing the model from reproducing copyrighted content. That's something that we want to prevent as well from the completions.
a week later, we're like, okay, now we can ship it. We've got to actually sort these things out. So that's on the content moderation side, I would say. And then on the response side as well, we also have additional pieces that sit there that are either around preventing the model from reproducing copyrighted content. That's something that we want to prevent as well from the completions.
a week later, we're like, okay, now we can ship it. We've got to actually sort these things out. So that's on the content moderation side, I would say. And then on the response side as well, we also have additional pieces that sit there that are either around preventing the model from reproducing copyrighted content. That's something that we want to prevent as well from the completions.
And then other harms that are against the way we think the model should behave and ideally have been caught earlier, but if they aren't, then they can get caught at that last mile. So it's like our head of trust and safety was talking to him last week.