Aman Sanger

👤 Speaker

1050 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I think context length is another obvious one. So if you care, like, let's say you care about the two things of inference compute and then context window, maybe the thing you want to train is some kind of SSM because they're much, much cheaper and faster at super, super long context.

8258.849 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And even if maybe it is 10X worse scaling properties during training, meaning you have to spend 10X more compute to train the thing to get the same level of capabilities, it's worth it because you care most about that inference budget for really long context windows. So it'll be interesting to see how people kind of play with all these dimensions.

8275.493 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And even if maybe it is 10X worse scaling properties during training, meaning you have to spend 10X more compute to train the thing to get the same level of capabilities, it's worth it because you care most about that inference budget for really long context windows. So it'll be interesting to see how people kind of play with all these dimensions.

8275.493 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And even if maybe it is 10X worse scaling properties during training, meaning you have to spend 10X more compute to train the thing to get the same level of capabilities, it's worth it because you care most about that inference budget for really long context windows. So it'll be interesting to see how people kind of play with all these dimensions.

8275.493 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I mean, I think bigger is certainly better for just raw performance.

8322.085 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I mean, I think bigger is certainly better for just raw performance.

8322.085 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I mean, I think bigger is certainly better for just raw performance.

8322.085 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

8328.85 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

8328.85 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

8328.85 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

8344.72 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

8344.72 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

8344.72 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

8369.105 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

8369.105 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

8369.105 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

8390.179 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

8390.179 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

8390.179 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Let's like train this really, really big model on all these tokens and we'll distill it into a smaller one. And maybe we can get more signal per token for this much smaller model than we would have originally if we trained it.

8405.736 View full episode →

← Previous Page 50 of 53 Next →

Report any issue