Nick Heiner

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

There they are.

2308.953 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

But that was giving you noise the whole way you were getting up there.

2311.096 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

So one thing we do at Surge is we try to have 100% correctness, 100% tasks that actually work instead of just accepting this degree of noise.

2314.24 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

So that's probably my biggest recommendation for people trying to build their own eval sets is to I think there's a certain temptation where it's like building the eval site isn't fun.

2324.072 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Building the agent is what's fun.

2336.949 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Yeah.

2338.973 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

But like, yeah, you shouldn't you shouldn't skip your vegetables.

2339.915 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Yeah, I mean, they can be benchmarks, right?

2359.713 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Like at a high level, a benchmark is just a series of challenges for the model and scores.

2362.297 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

So RL environments are just a way to do that.

2366.704 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And yeah, in the fullness of time, do most benchmarks become RL environments?

2368.627 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

I think it's certainly possible.

2373.294 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

You know, it's sort of like in software development where you have your test pyramid, where at the bottom of the pyramid, you have your unit tests, which are very fine grained and give you very specific feedback.

2376.554 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And the top of the pyramid, you have your integration tests, which test the whole system.

2388.021 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And the reason it's shaped like a pyramid is that the integration tests are much more expensive and slow to run.

2392.045 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And when something fails, you don't know exactly what the problem is necessarily.

2398.717 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

But they're also way less brittle than the unit tests because they are tracking sort of closer to your end-to-end value.

2402.042 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And so I sort of see different benchmarks as having different spots in that pyramid where like, yeah, you need your RL environments to sort of track like, okay, end of the day, can this thing be a lawyer?

2411.063 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

But sometimes you want more specific benchmarks like instruction following or groundedness that will help you sort of tease out like, okay,

2423.267 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

My latest model checkpoint had a big regression on the lawyer abilities and it had a big regression on the instruction following abilities.

2430.976 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment