Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nick Heiner

๐Ÿ‘ค Speaker
529 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

Yeah.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

So essentially the structure of an eval is like a set of golden answers where you have tasks and then you have what the expected outcome is.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And as we were talking about earlier, the more interesting the task, the harder it is to construct that.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

Yeah.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

Because like the more open ended the evaluation is.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And yeah, that does become substantially difficult.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And frankly, like many, many golden sets are wrong.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

Right.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

Like it just it takes a lot of effort.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And again, if you have like noise, then that really disrupts your your development process.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

You know, you'll see when labs release new models, they talk about their benchmark scores.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And sometimes you'll see it like for a certain benchmark, everything will cluster around 80%.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And people will start to say, oh, the benchmark is saturated now.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And when they say saturated, what they mean is there's nothing more for us to learn.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

Like the model is sort of as good as it's going to get.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And sometimes it's because the benchmark has like a long tail of like really hard things.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

But sometimes it's because a lot of the tasks are just broken.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And you start with the benchmark and you're like, okay, I expect 20% of these are busted.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

I just don't know which 20.

The Neuron: AI Explained
Inside the Secret Labs Where AI Learns to Work

And then you train your model, you get to 80% and you're like, oh, those are the 20.