Aman Sanger

👤 Speaker

1050 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. So one thing to do would be, I think you probably need to train a process reward model, which is, so maybe we can get into reward models and outcome reward models versus process reward models. Outcome reward models are the kind of traditional reward models that people are trained for language modeling. And it's just looking at the final thing.

7261.468 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. So one thing to do would be, I think you probably need to train a process reward model, which is, so maybe we can get into reward models and outcome reward models versus process reward models. Outcome reward models are the kind of traditional reward models that people are trained for language modeling. And it's just looking at the final thing.

7261.468 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So if you're doing some math problem, let's look at that final thing you've done, everything, and let's assign a grade to it, how likely we think, like what's the reward for this outcome. Process reward models instead try to grade the chain of thought.

7281.79 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So if you're doing some math problem, let's look at that final thing you've done, everything, and let's assign a grade to it, how likely we think, like what's the reward for this outcome. Process reward models instead try to grade the chain of thought.

7281.79 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So if you're doing some math problem, let's look at that final thing you've done, everything, and let's assign a grade to it, how likely we think, like what's the reward for this outcome. Process reward models instead try to grade the chain of thought.

7281.79 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

7295.697 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

7295.697 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

7295.697 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.

7322.792 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.

7322.792 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.

7322.792 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Because if you really can grade every single step of the chain of thought, then you can kind of branch out and explore multiple paths of this chain of thought. And then use these process reward models to evaluate how good is this branch that you're taking.

7345.144 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Because if you really can grade every single step of the chain of thought, then you can kind of branch out and explore multiple paths of this chain of thought. And then use these process reward models to evaluate how good is this branch that you're taking.

7345.144 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Because if you really can grade every single step of the chain of thought, then you can kind of branch out and explore multiple paths of this chain of thought. And then use these process reward models to evaluate how good is this branch that you're taking.

7345.144 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And like the interesting work that I think has been done is figuring out how to properly train the process or the interesting work that has been open sourced. And people I think talk about is how to train the process reward models, maybe in a more automated way.

7372.864 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And like the interesting work that I think has been done is figuring out how to properly train the process or the interesting work that has been open sourced. And people I think talk about is how to train the process reward models, maybe in a more automated way.

7372.864 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And like the interesting work that I think has been done is figuring out how to properly train the process or the interesting work that has been open sourced. And people I think talk about is how to train the process reward models, maybe in a more automated way.

7372.864 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I could be wrong here, could not be mentioning something because I haven't seen anything super that seems to work really well for using the process reward models creatively to do tree search and code.

7387.718 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I could be wrong here, could not be mentioning something because I haven't seen anything super that seems to work really well for using the process reward models creatively to do tree search and code.

7387.718 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I could be wrong here, could not be mentioning something because I haven't seen anything super that seems to work really well for using the process reward models creatively to do tree search and code.

7387.718 View full episode →

← Previous Page 45 of 53 Next →

Report any issue