Aman Sanger

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

I mean, I think bigger is certainly better for just raw performance.

8322.085 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

I mean, I think bigger is certainly better for just raw performance.

8322.085 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

8328.85 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

8328.85 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

8328.85 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

8344.72 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

8344.72 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

8344.72 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

8369.105 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

8369.105 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

8369.105 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

8390.179 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

8390.179 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

8390.179 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Let's like train this really, really big model on all these tokens and we'll distill it into a smaller one. And maybe we can get more signal per token for this much smaller model than we would have originally if we trained it.

8405.736 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Let's like train this really, really big model on all these tokens and we'll distill it into a smaller one. And maybe we can get more signal per token for this much smaller model than we would have originally if we trained it.

8405.736 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Let's like train this really, really big model on all these tokens and we'll distill it into a smaller one. And maybe we can get more signal per token for this much smaller model than we would have originally if we trained it.

8405.736 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, I think there's a lot of these secrets and details about training these large models that I just don't know and are only privy to the large labs. And the issue is I would waste a lot of that money if I even attempted this because I wouldn't know those things.

8435.87 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, I think there's a lot of these secrets and details about training these large models that I just don't know and are only privy to the large labs. And the issue is I would waste a lot of that money if I even attempted this because I wouldn't know those things.

8435.87 View full episode →

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Yeah, I think there's a lot of these secrets and details about training these large models that I just don't know and are only privy to the large labs. And the issue is I would waste a lot of that money if I even attempted this because I wouldn't know those things.

8435.87 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment