Ryan Petersen
๐ค SpeakerAppearances Over Time
Podcast Appearances
Let me say it differently, Sachs. I think we have not yet figured out how to move the budgets from experimentation to mainline production, meaning where large chunks of the US economy are comfortable enough to with the ways in which hallucinations are managed such that they will replace legacy deterministic code with this new probabilistic model-generated code, meaning model-enabled code.
Let's just put it that way.
Let's just put it that way.
I think the question is, what is the dimension? We use Gemini. So for many tasks at 80, 90, we use Gemini. It's incredible. But for most of our co-gen, we use Anthropic, and Claude kicks ass. It's exceptional.
I think the question is, what is the dimension? We use Gemini. So for many tasks at 80, 90, we use Gemini. It's incredible. But for most of our co-gen, we use Anthropic, and Claude kicks ass. It's exceptional.
It depends on task. I agree with that. What Zach said is right. So it's kind of like, what problem are you trying to solve? And then you have to ride this technology wave that is compounding very quickly. All I was just trying to get across is that the error rates have been diminishing, but not nearly as fast as you need for some sectors of the economy. So you can use a model
It depends on task. I agree with that. What Zach said is right. So it's kind of like, what problem are you trying to solve? And then you have to ride this technology wave that is compounding very quickly. All I was just trying to get across is that the error rates have been diminishing, but not nearly as fast as you need for some sectors of the economy. So you can use a model
to generate deterministic code, that's great. And as long as you unit test it and integration test it, it'll be fine. But I'm saying, if you're gonna use a model in production, in an environment where if stuff goes wrong, there are consequences, we're not there yet.
to generate deterministic code, that's great. And as long as you unit test it and integration test it, it'll be fine. But I'm saying, if you're gonna use a model in production, in an environment where if stuff goes wrong, there are consequences, we're not there yet.
You'll quickly find that when you layer these models on top of each other, the test time compute costs are astronomical. And Aaron's probably dealt with this. It's like I get a bill from AWS, and it's like, oh, wait, hold on a second. I just... per 100,000 this month? What's going on? So you have to get to the bottom of it.
You'll quickly find that when you layer these models on top of each other, the test time compute costs are astronomical. And Aaron's probably dealt with this. It's like I get a bill from AWS, and it's like, oh, wait, hold on a second. I just... per 100,000 this month? What's going on? So you have to get to the bottom of it.
Okay, all of this speaks to this thing that's going to sound totally esoteric, but we all used to shit on QA, right? The least talented engineers were allocated to QA. I think in the world of AI, it'll end up being the most talented. You know, we internally at 8090, we call it improvement engineering and it's a total specialty. It's similar to when I kind of coined the growth team at Facebook.
Okay, all of this speaks to this thing that's going to sound totally esoteric, but we all used to shit on QA, right? The least talented engineers were allocated to QA. I think in the world of AI, it'll end up being the most talented. You know, we internally at 8090, we call it improvement engineering and it's a total specialty. It's similar to when I kind of coined the growth team at Facebook.
I feel it's the same kind of moment where improvement engineering is really the skill that translates toy apps and vibe coding into something that's very practical and real. And my team and the leader of this team, he's steeped in things like Japanese kata management from Toyota and quality systems.
I feel it's the same kind of moment where improvement engineering is really the skill that translates toy apps and vibe coding into something that's very practical and real. And my team and the leader of this team, he's steeped in things like Japanese kata management from Toyota and quality systems.
And these are all the things that matter when you're trying to just shrink the error rate down to zero so that you can use it in a reliable way. And also to document it so that if people want to question what happened or have, you know, recompense or some way to come back and say, hey, that really harmed me. How do you even do that?
And these are all the things that matter when you're trying to just shrink the error rate down to zero so that you can use it in a reliable way. And also to document it so that if people want to question what happened or have, you know, recompense or some way to come back and say, hey, that really harmed me. How do you even do that?
Like, these are all very complicated issues that will get sorted out. Super, I think, interesting.
Like, these are all very complicated issues that will get sorted out. Super, I think, interesting.
Bro, the plane just waits. Just text the pilot and just tell them you're... All right, listen.