Aman Sanger

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

That's amazing. And you can do, like, other fancy things where if you have lots of code blocks from the entire code base, you could use retrieval and things like embedding and re-ranking scores to add priorities for each of these components.

3343.416 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

That's amazing. And you can do, like, other fancy things where if you have lots of code blocks from the entire code base, you could use retrieval and things like embedding and re-ranking scores to add priorities for each of these components.

3343.416 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

That's amazing. And you can do, like, other fancy things where if you have lots of code blocks from the entire code base, you could use retrieval and things like embedding and re-ranking scores to add priorities for each of these components.

3343.416 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I think even as the system gets closer to some level of perfection, Often when you ask the model for something, not enough intent is conveyed to know what to do. And there are a few ways to resolve that intent. One is the simple thing of having the model just ask you, I'm not sure how to do these parts based on your query. Could you clarify that? I think the other could be maybe...

3425.322 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I think even as the system gets closer to some level of perfection, Often when you ask the model for something, not enough intent is conveyed to know what to do. And there are a few ways to resolve that intent. One is the simple thing of having the model just ask you, I'm not sure how to do these parts based on your query. Could you clarify that? I think the other could be maybe...

3425.322 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

I think even as the system gets closer to some level of perfection, Often when you ask the model for something, not enough intent is conveyed to know what to do. And there are a few ways to resolve that intent. One is the simple thing of having the model just ask you, I'm not sure how to do these parts based on your query. Could you clarify that? I think the other could be maybe...

3425.322 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

If there are five or six possible generations given the uncertainty present in your query so far, why don't we just actually show you all of those and let you pick them?

3454.935 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

If there are five or six possible generations given the uncertainty present in your query so far, why don't we just actually show you all of those and let you pick them?

3454.935 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

If there are five or six possible generations given the uncertainty present in your query so far, why don't we just actually show you all of those and let you pick them?

3454.935 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah, I mean, so we can go over a lot of the strategies that we use. One interesting thing is cache warming. And so what you can do is if, as the user is typing, you can have, you're probably going to use some piece of context. And you can know that before the user's done typing. So, you know, as we discussed before, Reusing the KV cache results in lower latency, lower costs, cross requests.

3795.515 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah, I mean, so we can go over a lot of the strategies that we use. One interesting thing is cache warming. And so what you can do is if, as the user is typing, you can have, you're probably going to use some piece of context. And you can know that before the user's done typing. So, you know, as we discussed before, Reusing the KV cache results in lower latency, lower costs, cross requests.

3795.515 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah, I mean, so we can go over a lot of the strategies that we use. One interesting thing is cache warming. And so what you can do is if, as the user is typing, you can have, you're probably going to use some piece of context. And you can know that before the user's done typing. So, you know, as we discussed before, Reusing the KV cache results in lower latency, lower costs, cross requests.

3795.515 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So as the user starts typing, you can immediately warm the cache with like, let's say the current file contents. And then when they press enter, there's very few tokens. It actually has to pre-fill and compute before starting the generation. This will significantly lower TTFD.

3822.241 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So as the user starts typing, you can immediately warm the cache with like, let's say the current file contents. And then when they press enter, there's very few tokens. It actually has to pre-fill and compute before starting the generation. This will significantly lower TTFD.

3822.241 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

So as the user starts typing, you can immediately warm the cache with like, let's say the current file contents. And then when they press enter, there's very few tokens. It actually has to pre-fill and compute before starting the generation. This will significantly lower TTFD.

3822.241 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. So the way transformers work, I mean, like one of the mechanisms that allow transformers to not just independently, like the mechanism that allows transformers to not just independently look at each token, but see previous tokens. are the keys and values to tension.

3839.395 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. So the way transformers work, I mean, like one of the mechanisms that allow transformers to not just independently, like the mechanism that allows transformers to not just independently look at each token, but see previous tokens. are the keys and values to tension.

3839.395 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

Yeah. So the way transformers work, I mean, like one of the mechanisms that allow transformers to not just independently, like the mechanism that allows transformers to not just independently look at each token, but see previous tokens. are the keys and values to tension.

3839.395 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And generally the way attention works is you have at your current token, some query, and then you've all the keys and values of all your previous tokens, which are some kind of representation that the model stores internally of all the previous tokens in the prompt. And By default, when you're doing a chat, the model has to, for every single token, do this forward pass through the entire model.

3858.988 View full episode →

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

And generally the way attention works is you have at your current token, some query, and then you've all the keys and values of all your previous tokens, which are some kind of representation that the model stores internally of all the previous tokens in the prompt. And By default, when you're doing a chat, the model has to, for every single token, do this forward pass through the entire model.

3858.988 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment