Kurt Mackey
๐ค SpeakerAppearances Over Time
Podcast Appearances
Not at the individual level. Right. Well, so, yeah, the metric is normalized. So you're looking at aggregate divided by the population. But in terms of like visualizing or reporting this, you're not looking at a list of people, right? You're looking at teams and organizations. Yeah. Right. I do see the contradiction there, though. Yeah.
Not at the individual level. Right. Well, so, yeah, the metric is normalized. So you're looking at aggregate divided by the population. But in terms of like visualizing or reporting this, you're not looking at a list of people, right? You're looking at teams and organizations. Yeah. Right. I do see the contradiction there, though. Yeah.
I'll write that down.
I'll write that down.
And I think there's more we can do, right? I was just talking to a company, actually working with a company, Silicon Valley tech company, and all the other core form metrics were quite a bit below like P50, like industry peers, but diffs per engineer was higher.
And I think there's more we can do, right? I was just talking to a company, actually working with a company, Silicon Valley tech company, and all the other core form metrics were quite a bit below like P50, like industry peers, but diffs per engineer was higher.
And this is bad for them because they're trying to show to their executives that they're behind peers so they can get funding to make improvements, right? Sure. So we were just trying to dive into the data. Like, why is your diffs per engineer inflated, even though...
And this is bad for them because they're trying to show to their executives that they're behind peers so they can get funding to make improvements, right? Sure. So we were just trying to dive into the data. Like, why is your diffs per engineer inflated, even though...
clearly like empirically and with the other core four data points like you're not like a high performing organization and right so we we couldn't really figure out an answer i mean there was a lot of speculate like you know there's just there a higher number of like config changes like small prs that aren't real changes but like every company has that right like we that should be kind of that uh fuzziness should already be kind of accounted for in our benchmarks and so that led to this idea like
clearly like empirically and with the other core four data points like you're not like a high performing organization and right so we we couldn't really figure out an answer i mean there was a lot of speculate like you know there's just there a higher number of like config changes like small prs that aren't real changes but like every company has that right like we that should be kind of that uh fuzziness should already be kind of accounted for in our benchmarks and so that led to this idea like
you know, could there be a weighted metric? So, so you're actually, because not all dips PRs are credit equal, like we talked about, right? Some are one minute changes. Some are, some are one line changes that are actually eight hours. Some are, you know, 800 line changes that are two minutes.
you know, could there be a weighted metric? So, so you're actually, because not all dips PRs are credit equal, like we talked about, right? Some are one minute changes. Some are, some are one line changes that are actually eight hours. Some are, you know, 800 line changes that are two minutes.
Like how do you, so, you know, if we could apply some kind of weighting to like bucketing all these dips and PRs. So almost the same way we do estimation, like t-shirt sizes or something like that. You know, I was thinking, could we use Gen AI like LLM to basically automatically try to categorize based on the title, the description of the task and the code changes?
Like how do you, so, you know, if we could apply some kind of weighting to like bucketing all these dips and PRs. So almost the same way we do estimation, like t-shirt sizes or something like that. You know, I was thinking, could we use Gen AI like LLM to basically automatically try to categorize based on the title, the description of the task and the code changes?
Like, you know, was this like a big change or was this actually a small change? And then you could get kind of like a weighted number. That would be an improvement to the signal you're getting out of like an output measure like this.
Like, you know, was this like a big change or was this actually a small change? And then you could get kind of like a weighted number. That would be an improvement to the signal you're getting out of like an output measure like this.
We don't know, but they are definitely higher. And I mean, I told them, look, if I had a little bit more time here, I would take a random sample of your 200 PRs and then random sample of other companies and try to do what an LLM would. I would look at the titles and descriptions and try to figure out, are your PRs generally smaller, lower effort or size tasks than other companies?
We don't know, but they are definitely higher. And I mean, I told them, look, if I had a little bit more time here, I would take a random sample of your 200 PRs and then random sample of other companies and try to do what an LLM would. I would look at the titles and descriptions and try to figure out, are your PRs generally smaller, lower effort or size tasks than other companies?
I mean, that probably has to be the reason. I can't, it's an interesting problem though. Yeah.
I mean, that probably has to be the reason. I can't, it's an interesting problem though. Yeah.