David Shu
👤 PersonAppearances Over Time
Podcast Appearances
Yes, they're very good at writing programs to do the arithmetic and very bad at doing the arithmetic. So it's a great compromise. The thing we do with Sketch is try to give the underlying model access to information about the environment it's writing code in using function calls. So a lot of our work is not fine-tuning the model.
Yes, they're very good at writing programs to do the arithmetic and very bad at doing the arithmetic. So it's a great compromise. The thing we do with Sketch is try to give the underlying model access to information about the environment it's writing code in using function calls. So a lot of our work is not fine-tuning the model.
It's about letting it ask questions about not just the standard library, but the other libraries it's trying to use so that it can get better answers. It can look up the Go doc for a method, if it thinks it wants to call it, use that as part of its decision-making process about the code it generates.
It's about letting it ask questions about not just the standard library, but the other libraries it's trying to use so that it can get better answers. It can look up the Go doc for a method, if it thinks it wants to call it, use that as part of its decision-making process about the code it generates.
So at the beginning in your system prompt or something like your system prompt, depends on the API on exactly how the model works, you say there is a function call, which is get method docs. And it has a parameter, which is name of method.
So at the beginning in your system prompt or something like your system prompt, depends on the API on exactly how the model works, you say there is a function call, which is get method docs. And it has a parameter, which is name of method.
And in the middle of... And then you can ask a... You can construct a question to an LLM that says, generate a program that does this with the system prompt, which explains that there's a tool call there. And so as you're... LLM is generating that program, it can pause, make a system call, make a tool call that says, get me the docs for this.
And in the middle of... And then you can ask a... You can construct a question to an LLM that says, generate a program that does this with the system prompt, which explains that there's a tool call there. And so as you're... LLM is generating that program, it can pause, make a system call, make a tool call that says, get me the docs for this.
And so the LLM decides that it wants to know something about that method call, and then you go and run a program which gets the result, gets the documentation for that method from the actual source of truth. You paste it into the prompt, and then the LLM continues writing the program. Using that documentation as now part of its prompt.
And so the LLM decides that it wants to know something about that method call, and then you go and run a program which gets the result, gets the documentation for that method from the actual source of truth. You paste it into the prompt, and then the LLM continues writing the program. Using that documentation as now part of its prompt.
And so this is the model driving the questions about what it wants to know about. And just blocks and waits for that to come back.
And so this is the model driving the questions about what it wants to know about. And just blocks and waits for that to come back.
If you step back to like running Lama CPP yourself or something like this, you can sort of oversimplify one of these models as Every time you want to generate a token, you hand the entire history of the conversation you've had or whatever the text is before it to the GPU to build the state of the model. And then it generates the next token.
If you step back to like running Lama CPP yourself or something like this, you can sort of oversimplify one of these models as Every time you want to generate a token, you hand the entire history of the conversation you've had or whatever the text is before it to the GPU to build the state of the model. And then it generates the next token.
It actually generates a probability value for every token in its token set. And then the CPU picks the next token, attaches it to the full set of tokens, and then does that whole process again of sending over the entire conversation and then generating the next token. And so if you think about that sort of that very long, big, giant for loop around the outside of every time there is a.
It actually generates a probability value for every token in its token set. And then the CPU picks the next token, attaches it to the full set of tokens, and then does that whole process again of sending over the entire conversation and then generating the next token. And so if you think about that sort of that very long, big, giant for loop around the outside of every time there is a.
there's a new token, the token is chosen from the set of probabilities that comes back, is added to the set, and then a new set of probabilities is generated for the next token.
there's a new token, the token is chosen from the set of probabilities that comes back, is added to the set, and then a new set of probabilities is generated for the next token.
You can imagine in the middle of that for loop having some very traditional code in there that inserts a stack of tokens that wasn't actually decided by the LLM, but then become part of the history that the LLM is generating the next token from. And so that's how those embeds work. You can effectively...
You can imagine in the middle of that for loop having some very traditional code in there that inserts a stack of tokens that wasn't actually decided by the LLM, but then become part of the history that the LLM is generating the next token from. And so that's how those embeds work. You can effectively...