Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Aman Sanger

๐Ÿ‘ค Person
1050 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

There's this interesting thing where if you look at language model loss on different domains, I believe the bits per byte, which is kind of character normalized loss for code is lower than language, which means in general, there are a lot of tokens in code that are super predictable, a lot of characters that are super predictable.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And this is, I think, even magnified when you're not just trying to autocomplete code, but predicting what the user is going to do next in their editing of existing code. And so, you know, the goal of cursor taps, let's eliminate all the low entropy actions you take inside of the editor. When the intent is effectively determined, let's just jump you forward in time, skip you forward.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And this is, I think, even magnified when you're not just trying to autocomplete code, but predicting what the user is going to do next in their editing of existing code. And so, you know, the goal of cursor taps, let's eliminate all the low entropy actions you take inside of the editor. When the intent is effectively determined, let's just jump you forward in time, skip you forward.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And this is, I think, even magnified when you're not just trying to autocomplete code, but predicting what the user is going to do next in their editing of existing code. And so, you know, the goal of cursor taps, let's eliminate all the low entropy actions you take inside of the editor. When the intent is effectively determined, let's just jump you forward in time, skip you forward.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. I think I can speak to a few of the details on how to make these things work. They're incredibly low latency, so you need to train small models on this task. In particular... they're incredibly pre-filled token hungry. What that means is they have these really, really long prompts where they see a lot of your code and they're not actually generating that many tokens.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. I think I can speak to a few of the details on how to make these things work. They're incredibly low latency, so you need to train small models on this task. In particular... they're incredibly pre-filled token hungry. What that means is they have these really, really long prompts where they see a lot of your code and they're not actually generating that many tokens.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. I think I can speak to a few of the details on how to make these things work. They're incredibly low latency, so you need to train small models on this task. In particular... they're incredibly pre-filled token hungry. What that means is they have these really, really long prompts where they see a lot of your code and they're not actually generating that many tokens.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And so the perfect fit for that is using a sparse model, meaning an MOE model. Um, so that was kind of one, one breakthrough, one breakthrough we made that substantially improved performance at longer context. The other being, um, a variant of speculative decoding that we kind of built out called speculative edits.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And so the perfect fit for that is using a sparse model, meaning an MOE model. Um, so that was kind of one, one breakthrough, one breakthrough we made that substantially improved performance at longer context. The other being, um, a variant of speculative decoding that we kind of built out called speculative edits.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And so the perfect fit for that is using a sparse model, meaning an MOE model. Um, so that was kind of one, one breakthrough, one breakthrough we made that substantially improved performance at longer context. The other being, um, a variant of speculative decoding that we kind of built out called speculative edits.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

These are two, I think, important pieces of what make it quite high quality and very fast.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

These are two, I think, important pieces of what make it quite high quality and very fast.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

These are two, I think, important pieces of what make it quite high quality and very fast.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And then, yeah, you need to reuse the KV cache across requests just so that you're spending less work, less compute.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And then, yeah, you need to reuse the KV cache across requests just so that you're spending less work, less compute.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And then, yeah, you need to reuse the KV cache across requests just so that you're spending less work, less compute.

Lex Fridman Podcast
#446 โ€“ Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

This is what we're talking about.