Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

15 May 2024

1h 35m

15205 words

2 speakers

15 May 2024

Audio

Description

Chatted with John Schulman (cofounded OpenAI and led ChatGPT creation) on how posttraining tames the shoggoth, and the nature of the progress to come...Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Timestamps(00:00:00) - Pre-training, post-training, and future capabilities(00:16:57) - Plan for AGI 2025(00:29:19) - Teaching models to reason(00:40:50) - The Road to ChatGPT(00:52:13) - What makes for a good RL researcher?(01:00:58) - Keeping humans in the loop(01:15:15) - State of research, plateaus, and moatsSponsorsIf you’re interested in advertising on the podcast, fill out this form.* Your DNA shapes everything about you. Want to know how? Take 10% off our Premium DNA kit with code DWARKESH at mynucleus.com.* CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at commandbar.com. Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Featured

Dwarkesh Patel

John Schulman

Topics

OpenAI ChatGPT AI capability

Transcription

Full Episode

0.031 - 21.263 Dwarkesh Patel

Today, I have the pleasure to speak with John Schulman, who is one of the co-founders of OpenAI and leads the post-training team here. And he also led the creation of ChatGPT and is the author of many of the most important and widely cited papers in AI and RL, including PPO and many others. So, John, really excited to chat with you. Thanks for coming on the podcast.

21.444 - 40.755 Dwarkesh Patel

Thanks for having me on the podcast. I'm a big fan. Thank you. Thank you for saying that. So the first question I had is, we have these distinctions between pre-training and post-training. Beyond what is actually happening in terms of loss function and training regimes, I'm just curious, taking a step back conceptually, what kind of thing is pre-training creating?

40.816 - 43.42 Dwarkesh Patel

What does post-training do on top of that?

44.952 - 68.594 John Schulman

In pre-training, you're basically training to imitate all of the content on the internet or on the web, including websites and code and so forth. So you get a model that can basically generate content that looks like random web pages from the internet. The model is also trained to maximize likelihood, where it has to put a probability on everything.

68.714 - 93.121 John Schulman

So the objective is basically predicting the next token given the previous tokens. Tokens are like words or parts of words. And since the model has to put a probability on it, and we're training to maximize log probability, it ends up being very calibrated. So it can not only generate all of this, the content of the web, it can also assign probabilities to everything.

94.183 - 122.991 John Schulman

So the base model can effectively take on all of these different personas or generate all these different kinds of content. And then when we do post-training, we're usually targeting a narrower range of behavior where We basically want the model to behave like this kind of chat assistant. And it's a more specific persona where it's trying to be helpful. It's not trying to imitate a person.

123.031 - 141.158 John Schulman

It's answering your questions or doing your tasks. we're optimizing on a different objective, which is more about producing outputs that humans will like and find useful as opposed to just trying to imitate this raw content from the web. Yeah. Okay.

141.178 - 155.439 Dwarkesh Patel

I think maybe I should take a step back and ask, right now we have these models that are pretty good at acting as chatbots. Just taking a step back from how these processes were currently, what were the models released by the end of, kinds of things the models released at the end of the year will be capable of doing?

155.679 - 159.925 Dwarkesh Patel

What do you see the progress looking like five, you know, carry this forward for the next five years?

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Full Episode

Sign in to Audioscrape

Share this moment

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Full Episode

Want to see the complete chapter?

Sign in to Audioscrape

Share this moment