Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
It didn't necessarily be... There's things called system prompts, which are when you're querying a model, it's a piece of text that is shown to the model, but not to the user. So a fun example is your system prompt could be talk like a pirate. So no matter what the user says to the model, it'll respond like a pirate. In practice, what they are is... You are a helpful assistant.
It didn't necessarily be... There's things called system prompts, which are when you're querying a model, it's a piece of text that is shown to the model, but not to the user. So a fun example is your system prompt could be talk like a pirate. So no matter what the user says to the model, it'll respond like a pirate. In practice, what they are is... You are a helpful assistant.
It didn't necessarily be... There's things called system prompts, which are when you're querying a model, it's a piece of text that is shown to the model, but not to the user. So a fun example is your system prompt could be talk like a pirate. So no matter what the user says to the model, it'll respond like a pirate. In practice, what they are is... You are a helpful assistant.
You should break down problems. If you don't know about something, don't tell them. Your date cut off is this. Today's date is this. It's a lot of really useful context for how can you answer a question well.
You should break down problems. If you don't know about something, don't tell them. Your date cut off is this. Today's date is this. It's a lot of really useful context for how can you answer a question well.
You should break down problems. If you don't know about something, don't tell them. Your date cut off is this. Today's date is this. It's a lot of really useful context for how can you answer a question well.
Yes, which I think is great. And there's a lot of research that goes into this and One of your previous guests, Amanda Askell, is probably the most knowledgeable person, at least in the combination of execution and sharing. She's the person that should talk about system prompts and character of models.
Yes, which I think is great. And there's a lot of research that goes into this and One of your previous guests, Amanda Askell, is probably the most knowledgeable person, at least in the combination of execution and sharing. She's the person that should talk about system prompts and character of models.
Yes, which I think is great. And there's a lot of research that goes into this and One of your previous guests, Amanda Askell, is probably the most knowledgeable person, at least in the combination of execution and sharing. She's the person that should talk about system prompts and character of models.
And you could use this for bad things. We've done tests, which is, what if I tell the model to be a dumb model? Which evaluation scores go down? And it's like, we'll have this behavior where it could sometimes say, oh, I'm supposed to be dumb.
And you could use this for bad things. We've done tests, which is, what if I tell the model to be a dumb model? Which evaluation scores go down? And it's like, we'll have this behavior where it could sometimes say, oh, I'm supposed to be dumb.
And you could use this for bad things. We've done tests, which is, what if I tell the model to be a dumb model? Which evaluation scores go down? And it's like, we'll have this behavior where it could sometimes say, oh, I'm supposed to be dumb.
And sometimes it doesn't affect math abilities as much, but something like, if you're trying, it's just the quality of a human judgment would drop to the floors. Let's go back to post-training, specifically RLHF around Lama 2. Too much safety prioritization was baked into the model weights. This makes you refuse things in a really annoying way for users. It's not great. It caused a lot of...
And sometimes it doesn't affect math abilities as much, but something like, if you're trying, it's just the quality of a human judgment would drop to the floors. Let's go back to post-training, specifically RLHF around Lama 2. Too much safety prioritization was baked into the model weights. This makes you refuse things in a really annoying way for users. It's not great. It caused a lot of...
And sometimes it doesn't affect math abilities as much, but something like, if you're trying, it's just the quality of a human judgment would drop to the floors. Let's go back to post-training, specifically RLHF around Lama 2. Too much safety prioritization was baked into the model weights. This makes you refuse things in a really annoying way for users. It's not great. It caused a lot of...
um like awareness to be attached to rlhf that it makes the models dumb and it stigmatized the word it did in ai culture and as the techniques have evolved that's no longer the case where all these labs have very fine-grained control over what they get out of the models through techniques like rlhf although although different labs are definitely different levels like on the on one end of the spectrum is google
um like awareness to be attached to rlhf that it makes the models dumb and it stigmatized the word it did in ai culture and as the techniques have evolved that's no longer the case where all these labs have very fine-grained control over what they get out of the models through techniques like rlhf although although different labs are definitely different levels like on the on one end of the spectrum is google
um like awareness to be attached to rlhf that it makes the models dumb and it stigmatized the word it did in ai culture and as the techniques have evolved that's no longer the case where all these labs have very fine-grained control over what they get out of the models through techniques like rlhf although although different labs are definitely different levels like on the on one end of the spectrum is google
And the important thing to say is that no matter how you want the model to behave, these RLHF and preference tuning techniques also improve performance. So on things like math evals and code evals, there is something innate to these what is called contrastive loss functions. We could start to get into RL here.
And the important thing to say is that no matter how you want the model to behave, these RLHF and preference tuning techniques also improve performance. So on things like math evals and code evals, there is something innate to these what is called contrastive loss functions. We could start to get into RL here.