Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's going to come to the rest of society, too.
This is where the whole like hacking models comes from, right? Like GPT will not tell you how to make anthrax, but if you try really, really hard, you can eventually get it to tell you about anthrax because they didn't filter it from the pre-training data set, right?
This is where the whole like hacking models comes from, right? Like GPT will not tell you how to make anthrax, but if you try really, really hard, you can eventually get it to tell you about anthrax because they didn't filter it from the pre-training data set, right?
This is where the whole like hacking models comes from, right? Like GPT will not tell you how to make anthrax, but if you try really, really hard, you can eventually get it to tell you about anthrax because they didn't filter it from the pre-training data set, right?
I mean, people have been meaning on like games and other stuff, how to like say things that don't say Tiananmen Square. But, or like, yeah, so there's always like different ways to do it. There's, hey, the internet as a whole does tend to just have a slight left bias, right? Because it's always been richer, more affluent, right?
I mean, people have been meaning on like games and other stuff, how to like say things that don't say Tiananmen Square. But, or like, yeah, so there's always like different ways to do it. There's, hey, the internet as a whole does tend to just have a slight left bias, right? Because it's always been richer, more affluent, right?
I mean, people have been meaning on like games and other stuff, how to like say things that don't say Tiananmen Square. But, or like, yeah, so there's always like different ways to do it. There's, hey, the internet as a whole does tend to just have a slight left bias, right? Because it's always been richer, more affluent, right?
younger people on the internet relative to the rest of the population so there is already inherently a slight left bias right on the internet and so how do you filter things that are this complicated right is it like and and some of these can be like you know factual non-factual but like Tiananmen Square is obviously the example of a factual but it gets a lot harder when you're talking about aligning to a ideal right um
younger people on the internet relative to the rest of the population so there is already inherently a slight left bias right on the internet and so how do you filter things that are this complicated right is it like and and some of these can be like you know factual non-factual but like Tiananmen Square is obviously the example of a factual but it gets a lot harder when you're talking about aligning to a ideal right um
younger people on the internet relative to the rest of the population so there is already inherently a slight left bias right on the internet and so how do you filter things that are this complicated right is it like and and some of these can be like you know factual non-factual but like Tiananmen Square is obviously the example of a factual but it gets a lot harder when you're talking about aligning to a ideal right um
And so Grok, for example, Elon's tried really hard to make the model not be super PC and woke, but the best way to do pre-training is to throw the whole freaking internet at it and then later figure out. But then at the end of the day, the model at its core now still has some of these ideals.
And so Grok, for example, Elon's tried really hard to make the model not be super PC and woke, but the best way to do pre-training is to throw the whole freaking internet at it and then later figure out. But then at the end of the day, the model at its core now still has some of these ideals.
And so Grok, for example, Elon's tried really hard to make the model not be super PC and woke, but the best way to do pre-training is to throw the whole freaking internet at it and then later figure out. But then at the end of the day, the model at its core now still has some of these ideals.
You still ingested Reddit slash r slash politics, which is probably the largest political discussion board on the world that's freely available to scrape. And guess what? That's left leaning, right? And so, you know, there are some aspects like that you just can't censor unless you try really, really, really, really, really hard.
You still ingested Reddit slash r slash politics, which is probably the largest political discussion board on the world that's freely available to scrape. And guess what? That's left leaning, right? And so, you know, there are some aspects like that you just can't censor unless you try really, really, really, really, really hard.
You still ingested Reddit slash r slash politics, which is probably the largest political discussion board on the world that's freely available to scrape. And guess what? That's left leaning, right? And so, you know, there are some aspects like that you just can't censor unless you try really, really, really, really, really hard.
And I mean, it's like you can, you also have the ingested data of like Twitter or like Reddit slash r slash the Donald, which is like also super pro-Trump, right? And then you have like fascist subreddits or like you have communist subreddits. So the model in pre-training ingests everything. It has no worldview.
And I mean, it's like you can, you also have the ingested data of like Twitter or like Reddit slash r slash the Donald, which is like also super pro-Trump, right? And then you have like fascist subreddits or like you have communist subreddits. So the model in pre-training ingests everything. It has no worldview.
And I mean, it's like you can, you also have the ingested data of like Twitter or like Reddit slash r slash the Donald, which is like also super pro-Trump, right? And then you have like fascist subreddits or like you have communist subreddits. So the model in pre-training ingests everything. It has no worldview.
Now, it does have some skew because more of the text is skewed a certain way, which is general, slight left, but also somewhat intellectual. It's just the general internet is a certain way. And then as Nathan's about to describe eloquently, you can elicit certain things out.