Nick Heiner

Inside the Secret Labs Where AI Learns to Work

Yeah.

Inside the Secret Labs Where AI Learns to Work

So essentially the structure of an eval is like a set of golden answers where you have tasks and then you have what the expected outcome is.

2230.092 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And as we were talking about earlier, the more interesting the task, the harder it is to construct that.

2239.127 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Yeah.

2245.356 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Because like the more open ended the evaluation is.

2246.138 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And yeah, that does become substantially difficult.

2251.093 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And frankly, like many, many golden sets are wrong.

2253.818 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Right.

2259.188 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Like it just it takes a lot of effort.

2259.409 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And again, if you have like noise, then that really disrupts your your development process.

2261.693 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

You know, you'll see when labs release new models, they talk about their benchmark scores.

2267.744 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And sometimes you'll see it like for a certain benchmark, everything will cluster around 80%.

2275.322 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And people will start to say, oh, the benchmark is saturated now.

2278.851 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And when they say saturated, what they mean is there's nothing more for us to learn.

2282.915 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

Like the model is sort of as good as it's going to get.

2286.48 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And sometimes it's because the benchmark has like a long tail of like really hard things.

2288.603 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

But sometimes it's because a lot of the tasks are just broken.

2296.134 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And you start with the benchmark and you're like, okay, I expect 20% of these are busted.

2300.3 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

I just don't know which 20.

2304.346 View full episode →

The Neuron: AI Explained

Inside the Secret Labs Where AI Learns to Work

And then you train your model, you get to 80% and you're like, oh, those are the 20.

2305.949 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment