Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

Oxide and Friends

Engineering Rigor in the LLM Age

15 Jan 2026

Transcription

Chapter 1: What predictions are being made about LLMs and software engineering?

0.031 - 15.424 Bryan Cantrill

Last week, he made us wait for four minutes. Well, my man in the office says, hello, Adam. Says Brian's here. Hey, Brian, how are you? I'm doing well. How are you? I'm doing very well. And we've got all the Oxide friends here. We've got David and Rainn.

0

15.978 - 33.244 Adam Leventhal

And rain. Great. You know, if our predictions episode was only a week ago and yet it already feels like at least one of our predictions already feels like such a lock. It's amazing that it was even considered a prediction as little as a week ago.

0

33.224 - 53.053 Adam Leventhal

I think this Ennui, the software engineer Ennui, and then your absolutely brilliant naming of Deep Blue for this sense of software engineer Ennui, wondering what the real purpose of anything is that the LLM could just do everything for them. It feels like this has already taken root in the last week. Is that my imagination?

0

53.073 - 71.237 Bryan Cantrill

It feels like this has been... I don't think it's just your imagination. I think it may also be my imagination. But when I saw someone... Not even tag us, but just describe the feeling as Deep Blue. I was like, wow, this is really getting there. We've really made it.

0

71.817 - 85.033 Adam Leventhal

We made it. We've definitely arrived. How ironic would it be if that cease and desist came from IBM for naming, for sullying the good brand of Deep Blue into a kind of like...

85.013 - 88.339 Bryan Cantrill

I'll tell you, the predictions market does not have that one coming.

89.02 - 99.418 Adam Leventhal

Exactly. Let's see deep blue disambiguation on the Wikipedia page where they actually need to clarify that we're not talking about the software-injuring neodepression, LOM-based depression.

99.439 - 107.072 Bryan Cantrill

Yeah, if we see the polymarket on that spiking, we know that there's a C&D coming and some insiders are profiting on it.

107.896 - 114.105 Adam Leventhal

Oh, it's my understanding that those insiders are supposed to be us, right? Isn't that the way, isn't that what Polymarket, isn't that the story? Isn't that who they serve?

Chapter 2: How can LLMs enhance the rigor in software development?

503.489 - 515.42 Adam Leventhal

I didn't tell you that. I don't even know what you're talking about. Yeah, exactly. So, but you, but you also, so, I mean, in hindsight, would you, today, would that, that would be a fine first Rust project?

0

515.541 - 516.463 Bryan Cantrill

There was nothing. I think so.

0

516.483 - 518.487 Adam Leventhal

About the project itself. Yeah, right.

0

518.508 - 524.601 Bryan Cantrill

Yeah. I mean, it was, it was, it was even simpler than the thing you, that ended up being your first Rust project.

0

525.263 - 545.825 Adam Leventhal

Right. So you always want to have a good kind of first thing for these things. And I've been kind of waiting for a good, like what is a good thing to use cloud code on? Because I just want to like see how it does basically on this stuff. And I had some like, some relatively straightforward scalability work that needed to be done, a lot that needed to be broken up. I knew how I wanted to do it.

546.406 - 552.252 Adam Leventhal

It was going to be a little bit tedious, but I was just kind of curious to see how it did.

552.232 - 560.875 Bryan Cantrill

And it should be said that the idea here also was like, you're breaking up this lock in a way that many locks before it have been broken up. Is that fair to say?

561.175 - 590.966 Adam Leventhal

Yes, absolutely. There's actually, like what needs to be done here is really quite straightforward and I can describe it pretty completely to Cloud Code. And I'll drop a link to the actual bug itself. Lumos 17.8.16. So I'll drop a link in for that. And so you can see exactly what the problem was at hand. pretty straightforward.

591.527 - 614.544 Adam Leventhal

Now, I was gonna use, like very deliberately, not using it, I'm definitely not closing the loop, not vibe coding it, not one-shotting it, but really, because in particular, like I am not, I'm not even gonna let it build anything, right? I'm gonna let it, we're gonna go into the source base and I just wanted to see how it did. And it really did, like it did remarkably well.

Chapter 3: What issues have been encountered in using LLMs for coding?

827.417 - 848.516 Adam Leventhal

Yeah. I mean, like this is one of these where I, in many ways I had biased it for maximal success. I knew I had a pretty good idea of what it was going to look like. Um, But there's also some fiddly bits that people, you know, look at the, and I actually, I'll put a link to the diff in the actual bug. It's like, there are some fiddly bits to get right, actually.

0

848.536 - 861.388 Adam Leventhal

There's a little bit of math that needs to be, that you need to do correctly. It's not, but yes, I definitely knew what the code was going to look like. And this is, it doesn't span multiple things. files. We're not introducing a new subsystem. Like this is pretty straightforward as it goes.

0

861.608 - 873.298 Adam Leventhal

So this is, I would say, a relatively, a case that is, that I really picked because it's kind of biased for success. Also picked it because we need to do it, by the way. I mean, that's the other thing. It's like, this is like,

0

873.278 - 878.523 Bryan Cantrill

This was not a yak shape. This was like, you were doing it in four hours or you're doing it in two hours, either way.

0

879.103 - 896.46 Adam Leventhal

Either way, it had to be done. That's exactly right. I would say the other thing is that the four hours versus two hours ends up being really actionable because I started this at 10 o'clock at night. And it was like, there's a pretty big difference between going to bed at midnight and going to bed at two in the morning. You know what I mean?

896.48 - 920.751 Adam Leventhal

In terms of, so, you know, sometimes like that, that difference can be, so anyway. It was pretty impressive and gave me the belief that we could actually use this in lots of other places. But that is my limited experience. I definitely want to... So we've got two of our colleagues here. We've got David and Rain here.

920.731 - 938.993 Adam Leventhal

And both of you have used LLMs quite a bit and have discovered, I would say, new vistas of rigor. Rain, do you want to kick us off on some of the stuff that you've done where you found this to be useful?

939.698 - 957.182 Rain Paharia

Sure. So there's a couple of different things I can talk about here. One of them is kind of the first work that I did that was around May of last year. And then the other one is like the work I did around December with like reorganizing types and stuff. Which one should I go with?

957.422 - 961.948 Adam Leventhal

Let's actually start chronologically because let's start as you're kind of getting into this stuff.

Chapter 4: How do LLMs compare to traditional coding practices?

1223.924 - 1250.247 Rain Paharia

I wanted to get that rigor where every method has a doc test associated with it. I don't know about you, but I hate writing 5,000 lines of doc tests. I just told the LLM to do that. I gave it a couple of examples to start with, and I just told Sonnet 4.1, I think, to do that. It just replicated the things that it wrote like thousands of lines of doc tests.

0

1251.769 - 1276.941 Rain Paharia

This work that I'd been dreading because it would be weeks of work, It took me, I want to say, less than a day to get the whole thing ready. So it was three weeks of careful, deep analysis and work and thinking about unsafe and so on. And then one day of... I was talking to someone on Blue Sky about this. And I think they described it as a pattern amplification machine.

0

1277.661 - 1278.522 Adam Leventhal

Yeah, interesting.

0

1278.963 - 1308.505 Rain Paharia

Right? And so you give it a pattern. And it just kind of amplifies that pattern into whatever degree you want, right? The thing is that before LLMs, I would have probably investigated a code generation library. I would have tried out macros or whatever. And all of them have some downsides. The LLM kind of doing things and tweaking things locally as it went along and like,

0

1308.485 - 1329.94 Rain Paharia

you know, things like for a B-tree map, it'll say like ordered. And for a hash map, it won't say that. Just like, you know, making sure that the documentation is all aligned and everything. It was like, that was my first experience. And it was like a great experience where like, it wasn't a one shot, but it was like... I want to say like maybe like five or six prompts total.

1330.061 - 1333.965 Rain Paharia

And it just kind of just nailed it. And so that was my first experience.

1334.506 - 1353.154 Adam Leventhal

So, yeah, a bunch of follow-up questions. So that's really interesting. So one, I mean, this is the kind of tedium that you do. Just like you say about the doc tests. We all know the doc tests are great. As a user of something, you really appreciate them. It just takes a lot of time to get that working correctly.

1353.195 - 1372.701 Adam Leventhal

It's really easy when you, as a human arc, I mean, like bluntly cutting and pasting, right? When you are cutting and pasting. Yeah. It's super easy to make a mistake where it's like, oh, that doc test, by the way. Have you looked at the doc test? Like that actually, you just cut and paste it. You changed it in two places, but not the third.

1372.761 - 1377.186 Adam Leventhal

And so now like what you have is kind of nonsense in the test. Like, well, that's not very good.

Chapter 5: What are the implications of using LLMs in debugging?

1684.886 - 1714.741 Rain Paharia

But for me, there are new vistas that open up and I think that's the way I think David put it. So there are things that were simply not feasible to do given company priorities and personal life stuff going on and all the different things that are involved in a human's life. that I feel like have opened up. For me, the goal of this library was to increase the amount of rigor in our software.

0

1715.301 - 1727.781 Rain Paharia

I think it is very cool that it was able to work on this. This is a way you increase rigor, is you build an abstraction that increases rigor even if it is tedious. That is an increase in rigor in the overall system.

0

1727.761 - 1741.574 Adam Leventhal

Totally. Yeah. So David, I mean, you, you were, as Rain points out, like you were among the earliest adopters at Oxide. I think you've really shown the light for a lot of us and, and, you know, showing what these things can and can't do. Do you want to talk a little bit about your experience of kind of getting into this?

0

1743.739 - 1759.891 David Crespo

Uh, yeah, yeah. Um, Yeah, I mean, for a long time, I think until this year, really, when Cloud Code took off, I was using LLMs as kind of like a fancy search, even before they were actually search engines. And everyone was like, it's not a search engine because you're getting this very lossy picture of what's in the model weights.

0

1760.411 - 1780.391 David Crespo

Even then, on things that they were trained very well on, which is what I work on, web dev, they were great, even just for retrieval. So I was using them a lot for that or small snippets. This year, I think, is when it really took off that the models could really Do more complex autonomous things based on a very small description?

1780.451 - 1795.351 David Crespo

And more importantly I think pull in like what you're talking about where when the cloud code is looking at the luminous code that you have On disk it's pulling in context that it doesn't have and that's very different from yeah You know it's not so much.

1795.391 - 1809.399 David Crespo

You know the typical use case the typical use is you know you ask it a one-sentence question and There's only so much detail that you can get back out of it because there's just not enough texture in the question and to tell it what to tell you back.

1809.419 - 1832.379 David Crespo

And so when I gave the talk about LLMs at Oxcon in September, a lot of what I stressed was the way to set up the problem for yourself is you want to give it enough so that the answer is in some sense contained in what you give it. And what these agent tools do by just living in a repo and pulling in whatever context they want is they give themselves that texture and context.

1832.399 - 1841.748 David Crespo

So that's really what's changed this year from the way I was using it a really long time ago. I was like, I was trying to, you know, I wrote a CLI that lets you pass stuff on standard in and you can dump files into it.

Chapter 6: How can LLMs help improve documentation and code quality?

0

2087.941 - 2102.522 David Crespo

Yeah, you know, part of it was that, you know, the bug report itself was even to some extent out of my depth. Like a couple of them I was really confident. And then one of them I was like, it sounds really good, but I just wasn't able to, you know, I didn't know enough about how Ghosty worked or how Zig worked to really evaluate.

0

2102.542 - 2110.834 David Crespo

So I was nervous, but I was, you know, up front with a lot of humility of like, I'm really not sure about this, but it sounds so good that I cannot hold it back.

0
0

2111.195 - 2131.649 Adam Leventhal

All right, so let's actually talk about the first two where you're like, okay, I don't know any Zeg, but like I'm a software engineer. I know many other programming languages where you were like, okay, I'm pretty sure that I, just based on its description and me looking at this code, I'm pretty sure that I'm looking at this code. I'm pretty sure I've got a legit bug here.

2132.249 - 2144.302 Adam Leventhal

Could you describe kind of those first two a little bit in terms of like, what did you, I mean, you had confidence. You could like, I can actually, not knowing very much Zig or knowing only this Zig I've learned, I think I've got a legit bug here.

2145.143 - 2163.052 David Crespo

Yeah, one of them was very simple because it was like a copy paste error where they were just referring to the wrong variable. And you could tell, you know, it was supposed to be grapheme bytes and it was hyperlink bytes, you know, and you could tell that that was, so it was like, okay, that sounds pretty straightforward. Another one. This was like two months ago, so I can't.

2163.152 - 2163.893 Adam Leventhal

Yeah, yeah. I'm so sorry.

2163.913 - 2180.314 David Crespo

The two months ago being several years ago, especially in the really complicated one was that, you know, something was it was a mutex lock that was like not being taken at the right time. And so there was like a conflict in. And so, you know, reasoning about that was pretty tough for me, not understanding how the code worked.

Chapter 7: What advice is given for software engineers entering the LLM age?

2405.495 - 2423.301 Bryan Cantrill

But when I was at Sun, there were some times I'd get a code review of like, I really need to imagine what I would have been like to write this so that I know what I'm looking for. In other cases, you're like, well, there's some code, and there's some tests, and I'll look around. But a lot of the thinking has probably already been done.

0

2423.281 - 2439.467 Adam Leventhal

And then on that, like for my, you're exactly right. And like, I mean, oh, look, I'm just ashamed to say it, but I'm gonna say it. Like the way I would review code from like a nemesis, you know, a nemesis integrates code and you're like, I am gonna, I'm gonna get my name.

0

2439.447 - 2458.165 Adam Leventhal

And I'm like, one of the things I realized I needed to do was for my own self-review and for reviewing people that were not my nemesis, I needed to channel that dark part of my brain that's like, I'm going to find this thing in here. And that's like, I mean, it's embarrassing to say, but it's definitely true.

0

2458.145 - 2468.199 Bryan Cantrill

Yeah, well, I do that because when I'm reviewing someone who I consider a friend and I want to do them the service of helping them with their code. But I guess we're just motivated differently, and that's fine. Yeah, OK.

0

2468.219 - 2477.132 Adam Leventhal

It feels like you're just trying to explain away a whole bunch of the code review comments. Very good code review comments. I mean, it feels like comments you'd give a nemesis. That's right. OK.

2477.112 - 2506.439 Bryan Cantrill

One man's nemesis, another man's friend. There you go. And then, David, as you're describing, you don't want to file a crap bug report. Man, have I seen some crap bug reports where people take you on this wild ride through a core file And you end up just nowhere. You're like, okay, I'm following, but all of this is just blather. You don't need an LLM to hallucinate. We've been doing that. Yes.

2507.261 - 2524.264 Bryan Cantrill

And we've seen these bug reports where you're like, okay, there's certainly a lot of information here, but you've actually not contributed. So that same, that empathy you're talking about is so at the core of engineering full stop, irrespective of the tools we're using.

2524.7 - 2526.357 Adam Leventhal

Yeah, that's a very good point.

2526.607 - 2550.5 David Crespo

It is really infuriating to see a bad AI bug report. I'm probably more optimistic than most people about LLMs, and I think part of that is just working at Oxide, and I don't really see anybody doing the pathological things that we hear about online. Everybody's so careful and serious at Oxide, so I worry that I'm biased toward optimism because I'm not seeing the median user of these tools.

Chapter 8: How do LLMs influence the future of software engineering roles?

2648.824 - 2676.356 Bryan Cantrill

I don't need to like, worry about kind of diverting you in the wrong places, a highly productive, unempathetic, uh, careless colleague. Like that's what takes 150% of my effort just to keep them from doing harm. And you're right. That like, it takes that formerly plotting. colleague or collaborator who you had to keep on the rails. It makes it much harder to steer them.

0

2678.399 - 2698.79 Rain Paharia

It's like a gish gallop almost. I feel like that's how I think about it, right? Where it's like a gish gallop for issues. I've luckily not faced too many crap bug reports. I've seen some AI bug reports, but they've all been very high quality. Kind of at the standard that I think I would expect myself to write a bug report.

0

2698.81 - 2727.204 Rain Paharia

So again, I am biased towards optimism here, but it is something I'm worried about. I do look at people just putting up garbage and it's like, okay, well, it's now harder to filter out garbage. I have to say on the flip side, a thing I've done is I've used Opus 4.5 and fed it a bug report and told it to tell me whether this bug report is real or not. Maybe that's the way to keep up.

0

2727.645 - 2741.833 Bryan Cantrill

It's like some open source Jevons paradox or whatever. There's no money involved here, but I just mean the cost of creating PRs and projects and all of these things has dropped so much that the volume has just accelerated.

0

2742.269 - 2762.337 Adam Leventhal

Well, I think, I also do think that with these open source projects, especially, I mean, you know, God bless small communities where you've got, I mean, it's like, I mean, I would be almost like intrigued by someone who's like, I'm going to use an LLM to file a bunch of bugs against a Lumos. You're like, that's weird. I mean, that's like, that's not a versus like. Talk to someone about that.

2762.357 - 2777.56 Adam Leventhal

Yeah. Talk to someone. Yeah. I mean, like I'm almost like, I'm almost like not opposed. That's a, that's a very okay. Versus like a project. I mean, it's certainly, we saw this with node where, you know, I've been in very, very large projects with many, many, many contributors and very small projects.

2777.72 - 2797.837 Adam Leventhal

And there's a lot to be said about being in a small project and a lot to be said about a project that doesn't attract as much attention because it doesn't attract as much of that kind of negative attention either. Uh, so there's, uh, there, I think this problem is, I'm sure there are, uh, there, there are some high profile repos for whom this problem is really, really acute.

2797.978 - 2805.032 Adam Leventhal

And, you know, maybe that was that way with, with ghosty and Mitchell, but for a lot of the stuff, at least I work in, it's, it's not really an acute problem.

2805.654 - 2805.754 Unknown

Yeah.

Comments

There are no comments yet.

Please log in to write the first comment.