Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Aman Sanger

πŸ‘€ Speaker
350 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

You're bottlenecked by how quickly, for long context with large batch sizes, by how quickly you can read those cache keys and values. That's memory bandwidth, and how can we make this faster? We can try to compress the size of these keys and values. Multi-query attention is the most aggressive of these.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Where normally with multi-head attention, you have some number of quote-unquote attention heads and some number of query heads. Multi-query just preserves the query heads, gets rid of all the key value heads. So there's only one kind of key value head, and there's all the remaining query heads.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

With group query, you instead preserve all the query heads, and then your keys and values are kind of... There are fewer heads for the keys and values, but you're not reducing it to just one. But anyways, the whole point here is you're just reducing the size of your KV cache.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah, multi-latent. That's a little more complicated. And the way that this works is it kind of turns the entirety of your keys and values across all your heads into this kind of one latent vector that is then kind of expanded inference time.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

what i mean ultimately how does that map to the user experience trying to get the yeah the two things that it maps to is you can now make your cash a lot larger because you've less space allocated for the kv cash you can maybe cash a lot more aggressively and a lot more things so you get more cash hits which are helpful for reducing the time to first token for the reasons that were kind of described earlier and then the second being when you

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

start doing inference with more and more requests and larger and larger batch sizes, you don't see much of a slowdown in as it's generating the tokens, the speed of that.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah. So like the basic, the size of your KV cache is both the size of all your prompts multiplied by the number of prompts being processed in parallel. So you could increase either those dimensions, right? The batch size or the size of your prompts without degrading the latency of generating tokens.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

One maybe hacky but interesting idea that I like is holding a lock on saving. And so basically, you can then have the language model kind of hold the lock on saving to disk.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And then instead of you operating in the ground truth version of the files that are saved to disk, you actually are operating what was the shadow workspace before and these unsaved things that only exist in memory that you still get linter errors for and you can

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

code in and then when you try to maybe run code it's just like there's a small warning that there's a lock and then you kind of will take back the lock from the language server if you're trying to do things concurrently or from the shadow workspace if you're trying to do things concurrently that's such an exciting future by the way it's a bit of a tangent but like to allow a model to change files

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah. And I think there may be different versions of like run ability where you For the simple things where you're doing things in the span of a few minutes on behalf of the user as they're programming, it makes sense to make something work locally in their machine.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think for the more aggressive things where you're making larger changes that take longer periods of time, you'll probably want to do this in some sandbox remote environment. And that's another incredibly tricky problem of how do you

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

exactly reproduce or mostly reproduce to the point of it being effectively equivalent for running code the user's environment with this remote remote sandbox i'm curious what kind of agency you want for for coding did you want them to find bugs do you want them to like implement new features like what agency you want

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Yeah. I mean, it's really interesting that these models are so bad at bug finding when just naively prompted to find a bug. They're incredibly poorly calibrated.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Exactly. Even 01. How do you explain that?

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think these models are really strong reflection of the pre-training distribution. And, you know, I do think they, they generalize as the loss gets lower and lower, but I don't think the, the loss and the scale is quite, or the loss is low enough such that they're like really fully generalizing in code. Like the things that we use these things for, uh, the frontier models, uh,

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

that they're quite good at are really code generation and question answering. And these things exist in massive quantities and pre-training with all of the code on GitHub on the scale of many, many trillions of tokens and questions and answers on things like stack overflow and maybe GitHub issues. And so when you try to push into these things that really don't exist, uh,

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

very much online, like, for example, the cursor tap objective of predicting the next edit given the edits done so far. The brittleness kind of shows. And then bug detection is another great example where there aren't really that many examples of actually detecting real bugs and then proposing fixes. And the models just really struggle at it. But I think it's a question of transferring the model.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

In the same way that you get this fantastic transfer from pre-trained models just on code in general to the cursor tab objective, you'll see a very, very similar thing with generalized models that are really good at code to bug detection. It just takes a little bit of nudging in that direction.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

how paranoid is the user? But even then, if you're putting in a maximum paranoia, it still just doesn't quite get it.