Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

John Siracusa

๐Ÿ‘ค Speaker
5126 total appearances

Appearances Over Time

Podcast Appearances

Accidental Tech Podcast
624: Do Less Math in Computers

company versus a Chinese company just seems like they're mad because somebody else is doing the same thing to them that they did to everybody else.

Accidental Tech Podcast
624: Do Less Math in Computers

company versus a Chinese company just seems like they're mad because somebody else is doing the same thing to them that they did to everybody else.

Accidental Tech Podcast
624: Do Less Math in Computers

Yeah, this is what we talked about when we were first discussing ChatGPT and the fact that they had like, you know, hundreds of thousands of human generated question and answer pairs to help train it. Yes, they trained on all the knowledge in the internet, but also there was a huge human powered effort of like, let's tailor make a bunch of

Accidental Tech Podcast
624: Do Less Math in Computers

Yeah, this is what we talked about when we were first discussing ChatGPT and the fact that they had like, you know, hundreds of thousands of human generated question and answer pairs to help train it. Yes, they trained on all the knowledge in the internet, but also there was a huge human powered effort of like, let's tailor make a bunch of

Accidental Tech Podcast
624: Do Less Math in Computers

what we think are correct or good question and answer pairs and feed them. And they had to pay human beings to make those that they could use to train their model. That obviously costs a lot of money, takes a lot of time. And, you know, Ben gives the AlphaGo example of like, if we try to make a computer program play a game really well, should we have like experts that go like teach the AI thing?

Accidental Tech Podcast
624: Do Less Math in Computers

what we think are correct or good question and answer pairs and feed them. And they had to pay human beings to make those that they could use to train their model. That obviously costs a lot of money, takes a lot of time. And, you know, Ben gives the AlphaGo example of like, if we try to make a computer program play a game really well, should we have like experts that go like teach the AI thing?

Accidental Tech Podcast
624: Do Less Math in Computers

What's the best move here or there? Or should we just say, no humans are involved. Here's the game. Here's the rules. just run with a huge amount of time with the reward function of winning the game. And eventually the model will figure out how to be the best go player in the world rather than us carefully saying, well, you gotta know this strategy. You gotta know that or whatever.

Accidental Tech Podcast
624: Do Less Math in Computers

What's the best move here or there? Or should we just say, no humans are involved. Here's the game. Here's the rules. just run with a huge amount of time with the reward function of winning the game. And eventually the model will figure out how to be the best go player in the world rather than us carefully saying, well, you gotta know this strategy. You gotta know that or whatever.

Accidental Tech Podcast
624: Do Less Math in Computers

Obviously getting the humans out of the loop saves money, saves time. Uh, and it removes some of the, uh, Blind alleys you might go down because humans are going to do a particular thing that works a particular way, and we don't know that that's the correct solution there.

Accidental Tech Podcast
624: Do Less Math in Computers

Obviously getting the humans out of the loop saves money, saves time. Uh, and it removes some of the, uh, Blind alleys you might go down because humans are going to do a particular thing that works a particular way, and we don't know that that's the correct solution there.

Accidental Tech Podcast
624: Do Less Math in Computers

So I'm assuming the R in both R1 and R10 both stand for reinforcement learning, and maybe the zero stands for โ€“ I'm trying to parse their names. Who knows? The fact that we took out the human factor entirely and โ€“

Accidental Tech Podcast
624: Do Less Math in Computers

So I'm assuming the R in both R1 and R10 both stand for reinforcement learning, and maybe the zero stands for โ€“ I'm trying to parse their names. Who knows? The fact that we took out the human factor entirely and โ€“

Accidental Tech Podcast
624: Do Less Math in Computers

we'll just train this this thing you know entirely with reinforcement learning on its own we don't have to guide it in any way uh that seems like it's probably a better approach because obviously the human feedback approach is not really scalable beyond a certain point right like you can you can keep scaling up the computing part as computers get faster and better and you give more power and money and blah blah blah but you can't employ every human on the planet to be making human question and answer pairs right if you get to that scaling point so this seems like a fruitful

Accidental Tech Podcast
624: Do Less Math in Computers

we'll just train this this thing you know entirely with reinforcement learning on its own we don't have to guide it in any way uh that seems like it's probably a better approach because obviously the human feedback approach is not really scalable beyond a certain point right like you can you can keep scaling up the computing part as computers get faster and better and you give more power and money and blah blah blah but you can't employ every human on the planet to be making human question and answer pairs right if you get to that scaling point so this seems like a fruitful

Accidental Tech Podcast
624: Do Less Math in Computers

approach. And again, practically speaking, if you want to do it in less money and less time, you can't hire 100,000 human beings to make questions and answers for your thing. So they didn't. And it turns out they can make something that worked pretty well even without doing that.

Accidental Tech Podcast
624: Do Less Math in Computers

approach. And again, practically speaking, if you want to do it in less money and less time, you can't hire 100,000 human beings to make questions and answers for your thing. So they didn't. And it turns out they can make something that worked pretty well even without doing that.

Accidental Tech Podcast
624: Do Less Math in Computers

We talked about that in a past ATP episode, about how mad they were. The people were trying to, like, figure out, like, because the people were, like, prompt engineering and saying, like, I know you're hiding the chain of thought. The chain of thought is like, how is it thinking through the problem or whatever? Like they show you a summary of it, but they don't show you the real one, right?

Accidental Tech Podcast
624: Do Less Math in Computers

We talked about that in a past ATP episode, about how mad they were. The people were trying to, like, figure out, like, because the people were, like, prompt engineering and saying, like, I know you're hiding the chain of thought. The chain of thought is like, how is it thinking through the problem or whatever? Like they show you a summary of it, but they don't show you the real one, right?

Accidental Tech Podcast
624: Do Less Math in Computers

And you can read the blog posts. This is from a while ago about why OpenAI did that. But then people were like, but I figured out if you prompt the O1 model in this way, it will tell you about its chain of thought. And OpenAI was like, that's against our terms of service. You can't look under the covers of how our thing works. You're not allowed to do that. And it was banning accounts and stuff.

Accidental Tech Podcast
624: Do Less Math in Computers

And you can read the blog posts. This is from a while ago about why OpenAI did that. But then people were like, but I figured out if you prompt the O1 model in this way, it will tell you about its chain of thought. And OpenAI was like, that's against our terms of service. You can't look under the covers of how our thing works. You're not allowed to do that. And it was banning accounts and stuff.