Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

25 Mar 2026

Transcription

Chapter 1: What is the main topic discussed in this episode?

0.031 - 11.574 Corey Knowles

Welcome, humans, to the Neuron AI Explained. I'm your host, Corey Knowles, and I'm here as always alongside us with a spring in his step and a button as hard as our own Grant Harvey. How are you today, my friend?

0

12.055 - 35.975 Grant Harvey

I'm doing well. I'm doing well. I love that. That's a very visual flowery description of me, which I welcomely embrace. How are you? Well, what are we talking about today, Grant? Today, we are talking about reinforcement learning environments, the training grounds where AI agents learn to plan, use tools, adopt, and make judgment calls across multi-step tasks.

0

36.556 - 46.373 Grant Harvey

These environments are quickly becoming the bottleneck for real-world AI performance. And in 2025, they were one of the most aggressively funded and least understood parts of the AI stack.

0

47.18 - 64.699 Corey Knowles

Our guest today is Nick Heiner, head of RL Environments at Surge AI, where he leads the company's reinforcement learning environments team. Before Surge, Nick was a founding engineer at Fixie, a senior engineer at Netflix working on UI platforms, and part of the U.S. Digital Service.

0

65.506 - 73.918 Grant Harvey

At Surge, he's helped build large-scale simulated workplaces like CoreCraft, environments used by frontier labs to test whether models can actually do knowledge work end-to-end.

74.358 - 87.497 Grant Harvey

He's also the co-author of Surge's recent research showing that even the best models fail roughly 40% of the time on real workplace tasks, with failures clustering around planning, adaptability, groundedness, and common sense.

88.372 - 90.294 Corey Knowles

Nick, welcome to The Neuron. Glad to have you.

90.855 - 98.183 Nick Heiner

Thank you. Happy to be here. You mentioned that it's one of the best funded but also least understood areas right now.

98.884 - 100.245 Corey Knowles

Do you agree with that assessment?

Chapter 2: What are reinforcement learning environments and why are they important?

111.919 - 115.763 Nick Heiner

Like we're just we're piling in here and there's still a lot to learn about the space.

0

116.468 - 131.035 Grant Harvey

Well, hopefully we can help out all of your VC friends and teach them a bit more direct from the horse's mouth here. I guess to start, how did you end up at Surge AI building RL environments? And what was the moment that you realized that this was the future of AI training?

0

132.719 - 151.116 Nick Heiner

So, I mean, the moment that I sort of left Netflix and went into AI startups in general was basically the moment, the first moment I used ChatGPT. And it was just, I'm sure everyone remembers where they were when they first used ChatGPT. I remember the moment. Right. It's just immediately obvious that this is something totally different.

0

151.096 - 165.92 Nick Heiner

And, you know, I love Netflix, but it just didn't feel like a time to be at a 25-year-old company in a well-established space. It's like this is a whole new field. So I came over to Surge, and at the time, everything was scaling up really fast.

0

166.602 - 184.542 Nick Heiner

Like, you know, Llama 2, GPT-3, sort of those initial models had come out, and everyone's looking at the scaling laws and saying, okay, now we just need to bump it up in order of magnitude. which means the entire supply chain needs to get bumped up in order of magnitude, of which we at Surge were a part. So when I first joined, I focused a lot on building out our expert network.

185.683 - 209.135 Nick Heiner

And that has a bunch of pieces to it. There's the actual recruiting, but then there's things like, how do you vet people? How do you see who's the best at what type of tasks? How do you apply quality checks to work at broader and broader scales? Then I transitioned to lead several of our client engagements. And that was during 2024.

209.175 - 228.505 Nick Heiner

And the big thing there was like, again, in 2023, a lot of these models were being produced by scrappy small bands of researchers at these labs. And then 2024, all those orgs scaled up from 20 geniuses to an org of 1000 people. And, you know, in much the same way, we had to scale up too.

228.865 - 249.075 Nick Heiner

And they sort of had new expectations of us, you know, high-level enterprise maturity, and then just being able to produce super high-quality data for, you know, 40 different research tracks at once instead of like a team of 20 that was focused on three. Wow. So that was a big part of my work in 2024 was sort of scaling my teams to do that.

249.055 - 265.034 Nick Heiner

And then 2025 focusing on our environments and a lot of the actual work, but a very strong through line with stuff that we had already been doing with labs. Um, some of it was sort of ties and pieces together. Um, some of it was just sort of a crystallization of a lot of other stuff we've been doing.

Chapter 3: What challenges do AI models face in real workplace tasks?

332.367 - 349.296 Nick Heiner

Then there's reinforcement learning from human feedback, which is when you golf, you know, you have an instructor, you're the driving range, you take two shots and the coach tells you, OK, the first one was better. And they don't necessarily even tell you what was better about it. They just tell you one was better than the other.

0

349.416 - 361.52 Nick Heiner

And you like you sort of try slightly different things every time and you start to converge on like what is the best thing to do. And then reinforcement learning environments takes it a step further.

0

361.841 - 382.158 Nick Heiner

And so instead of you're the driving range and you're limited by the availability of the coach, which, you know, to sort of say what it actually is, it's like you have humans looking at two responses from a model and choosing, you know, thumbs up, thumbs down. But that requires humans, right? Like you have to spend millions of hours to do that.

0

383.319 - 403.485 Nick Heiner

The reinforcement learning environment is you're sent out in the golf course by yourself. And you get feedback from the environment of like, okay, the ball went close to the target. Right. And in that way, you're able, again, to sort of self-teach in a sense, because you keep trying different things and then you keep getting that feedback of what worked and what didn't.

0

404.306 - 407.431 Nick Heiner

And yeah, you do that for a million hours and then all of a sudden you're a world-class golfer.

408.252 - 415.041 Corey Knowles

Makes me think of the thumbs up, thumbs down. Do you like this personality button on chat GPT? I see all the time.

415.358 - 433.587 Nick Heiner

Yes. And that is exactly what they're doing is they are collecting your user feedback. And so we've it's actually somewhat funny. You know, we've had experts in our network who spend a lot of time, you know, going in a lot of detail into these responses to assess which ones are better and they get paid to do it.

433.938 - 449.972 Nick Heiner

And when they see ChatGPT asking for that same information for free, like some of them have actually complained, you know, sort of in like, I mean, not like, I mean, they're just sort of vetting. It's not a serious thing, but yeah. But yes, but that is exactly what they're doing is they're gathering training data.

449.952 - 465.683 Grant Harvey

Okay. That's good to know. Well, the RL example with you're going out onto the golf course and you're trying to, based on the feedback that you get, adjust your game. That just, to me, feels like the most similar to how we humans learn in general. Do you agree with that?

Chapter 4: How does Nick Heiner's experience shape his views on AI training?

590.69 - 596.381 Nick Heiner

And then it will faithfully write the rest of that document. And so post-training is where you teach it not to do that.

0

597.282 - 597.523 Corey Knowles

Yeah.

0

597.603 - 597.964 Nick Heiner

Got it.

0

598.465 - 599.587 Corey Knowles

Kind of the behavior end.

0

600.428 - 601.23 Nick Heiner

Yes, exactly.

601.27 - 627.728 Corey Knowles

Okay. That's fascinating. So when we talk about AI in the enterprise, there's this huge wave of optimism. 84% of business leaders say AI is going to transform their industry, and that's massive. But here's the reality. 93% of them are struggling to actually make it work. That's the gap, and that's exactly what Dell AI Factory with NVIDIA is built to close.

629.071 - 652.653 Corey Knowles

Dell calls it the world's broadest AI portfolio, and that's not marketing fluff. We're talking everything from AI-ready PCs to servers, storage, networking, services, all designed to work together. But what really matters is this. They've already helped implement more than 3,000 real-world AI deployments. This is proven operational AI.

653.214 - 674.288 Corey Knowles

They don't just drop hardware at your doorstep and wish you luck. Dell brings expert services at every single stage. Strategy, deployment, scaling, so you're not stuck in pilot mode wondering why nothing's moving. If your organization believes AI is the future, but you're still trying to bridge that execution gap, check out the Dell AI Factory with NVIDIA.

674.689 - 693.799 Corey Knowles

Learn more today at dell.com slash yourwaytoai. That's dell.com slash yourwaytoai. So what makes a good environment versus just writing test cases? What are the differences in a good and bad environment?

Chapter 5: What factors contribute to the effectiveness of RL environments?

1302.356 - 1302.556 Corey Knowles

Nope.

0

1303.157 - 1307.801 Nick Heiner

Right. Right. Exactly. You know, every, every like longitudinal nutrition study. Right.

0

1307.821 - 1325.937 Nick Heiner

But, you know, but just, just like in biology where people are working on cell simulators to be able to get around some of that, you know, that is, again, the benefit of an oral environment is once you figure out how to verify these tasks, you have a simulated environment in which you can, you know, have many hours, you know, many years of like,

0

1325.917 - 1346.912 Grant Harvey

simulation time compressed into a much shorter walk lock time and then try to get some of that signal okay this brings me to actually one of my biggest questions which is um which kind of came up in conversation here uh are you doing any testing with like multi-agent um swarms like i know this is kind of like the big hot topic you know open claws talking about this a lot

0

1346.892 - 1362.049 Grant Harvey

There's even Molt's book, which is like all of the agents talking to each other. Like, would you ever simulate, you know, would you try to simulate, you know, a thousand people reading a textbook and, you know, how, you know, how they actually react to it? And I have a follow up question on that, but I'll just let you react.

1362.805 - 1380.964 Nick Heiner

Yeah, I mean, it's like the Codex app that just recently came out where they're really having as a first class concern managing a swarm of agents. So yeah, I mean, the simpler form is like what you see we published already with CoreCraft where you have one agent solving a customer support task at a time.

1381.705 - 1398.125 Nick Heiner

But yeah, when you get more nuanced, it's things like a financial markets simulator where you have multiple agents that are all participating in the market at the same time and in real time. And you just see sort of who comes out on top. So, yeah, that's definitely a very interesting area of research for us.

1399.407 - 1420.186 Grant Harvey

Related to that, do you have any sort of intuition or suspicion on whether or not the model that will be able to generalize across all of these different domains and be able to do a deep analysis and, you know, end a PowerPoint presentation? Will that be a single model or will that be like five models that are all work together? What's your take?

1420.206 - 1439.184 Nick Heiner

Yeah. Yeah. So it's important to distinguish here. Like when people say multiple models, sometimes they literally mean like a new model has been trained to make PowerPoints. And sometimes they mean it's the same LLM that just has like a different system prompt. It's just sort of like an agent that's pointed towards a different subtask. Right. Yeah.

Chapter 6: What is the significance of reward signals in AI training?

1493.244 - 1517.552 Nick Heiner

So so one thing we found, as you'll note in the write up, like we had said, a lot of the models behaved as if they were solving an academic problem. And this is interesting, but actually not surprising at all, because, you know, again, you are your objective function, right? Like you get what you're trained for. And a lot of benchmarks are fairly academic and contrived.

0

1517.802 - 1539.423 Nick Heiner

And this is a natural consequence of the fact that building a benchmark is incredibly expensive. And a lot of them are being done from an academic context that don't have huge budgets. And so like, you know, if you imagine like some of the questions that we were posing to the models here, they take a finance professional 20, 30, 40 hours to do.

0

1540.303 - 1563.939 Nick Heiner

So in order to build a benchmark, hundreds of questions like that, you need to find enough finance professionals and you need to pay them to spend 20 to 30, 40 hours per task. And so that's quite a lot. And frankly, until you've made an investment in having a really deep expert network and a lot of technology to produce great data with those people, it's just not feasible.

0

1564.479 - 1581.798 Nick Heiner

And so that's why you see a lot of benchmarks that have been used are like glorified like SATs. And so, you know, that's why we see that the models sort of behave in a very academic way. But when you put these real world constraints on them, like they're sort of that last mile problem.

0

1582.339 - 1604.146 Nick Heiner

You know, I'll give you another example of this, which is many coding agents do not want to use external libraries unless you really force them to do so. When you say I have a bunch of like, you know, chess puzzles and I want you to write a program is going to solve them. The obvious thing to do is to get Stockfish, just write a little wrapper, and then spit that out.

1604.427 - 1623.336 Nick Heiner

If you're saying this is for a production system, I'm building a chess app, yeah, just get Stockfish. Models want to make their own chess solver. Yeah, they want to build it from scratch. Because that's what you would do in an academic setting. You're not being tested on your ability to use Stockfish. You know, it's like those memes, you know, guys will do anything to avoid going to therapy.

1624.277 - 1634.166 Nick Heiner

It's like, you know, similarly, it's like coding models will do anything to avoid using the already perfectly good available library.

1634.186 - 1650.1 Grant Harvey

But I will say I kind of respect that with a situation like NPM where, you know, it just got hacked with Shilud. And I almost now I'm like, I don't want to use anything with NPM, you know, as an amateur vibe coder. Yeah, for sure.

1650.661 - 1661.436 Nick Heiner

And I do think it's a great feature of these models that we will be perhaps reaching for left pad less frequently because you can just have the confidence that your model can write that 10 line function for you.

Chapter 7: How do reinforcement learning environments differ from traditional training methods?

1670.088 - 1686.861 Nick Heiner

But there's a tie back to finance. Like, yeah, what we see is like excellence on sort of the core thing you would do in school versus But then when there are a lot of details or especially when the task is structured, not as I've given you everything in the prompt, all neatly bundled up.

0

1686.981 - 1707.199 Nick Heiner

But you need to go into our environment and like search through our confluence to find like our standard procedure for how we model this scenario. Check your email from a note for the VP that said, you know, here's some other important context for how this analysis needs to be done. You know, we got five different CSVs from the client. One of them is the relevant one.

0

1707.219 - 1713.272 Nick Heiner

The other four are out of date now. Like when you add all that stuff in, you know, that's where the models tend to fall apart.

0

1714.374 - 1714.614 Grant Harvey

Yeah.

0

1715.857 - 1735.425 Corey Knowles

That makes sense. So I guess let's talk a little bit about, you know, kind of the business end here. Should companies be investing in building their own RL environments and training their own models? Or is that still a waste of resources in a lot of ways and maybe best left to Frontier Labs?

1735.405 - 1766.734 Nick Heiner

Yeah, so I will admit that, you know, I work at a company that produces aerial environments. So if you're asking me, you know, I'm going to have a certain perspective here. But I think my perspective is also correct. And now I will share it. So, you know, I think it really comes down to... what the factory, your previous guest, the factory CTO, I think, was saying. Eno, yeah. Eno, yeah, yeah.

1766.754 - 1774.625 Nick Heiner

Where he was saying that making the code base agent-ready has a huge impact. And that absolutely matches what we've seen in our own research.

1774.646 - 1774.946 Corey Knowles

Yeah.

1774.966 - 1793.442 Nick Heiner

And so I don't think companies want to be building their own RL environments because there's a lot of work and infrastructure that goes into that that is not really part of the core competency. what they should be doing is making their entire business agent ready. And so that basically means that like,

Comments

There are no comments yet.

Please log in to write the first comment.