Sholto Douglas

👤 Speaker

1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

One of the bottlenecks for AI progress that many people identify is the inability of these models to perform tasks on long horizons, which means engaging with the task for many hours or even many weeks or months where if I have, I don't know, an assistant or an employee or something, they can just do a thing and tell them for a while.

357.983 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And AI agents haven't taken off for this reason, from what I understand.

378.993 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So how linked are long context windows and the ability to perform well on them and the ability to do these kinds of long horizon tasks that require you to engage with

382.503 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

an assignment for many hours?

392.105 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Or are these unrelated concepts?

393.768 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And there are ways that you can find a smooth metric for that.

466.823 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Yeah, human eval or whatever, in the GPT-4 paper, the coding problems, they measure it by... Log pass, right?

470.347 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Exactly.

476.093 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

For the audience, the context on this is...

477.434 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You it's basically the idea is you want to when you're measuring how much progress there has been on a specific task, like solving coding problems, you you upweighted when it gets it right.

480.31 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Only one in a thousand times.

491.489 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

You don't give it a one in a thousand score because it's like, oh, like got to write some of the time.

492.591 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

And so the curve you see is like it gets it right.

495.937 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

One in a thousand, then one in a hundred, then one in ten and so forth.

497.68 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So actually, I want to follow up on this.

501.421 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

So if your claim is that the AI agents haven't taken off because of reliability rather than long horizon task performance, isn't the

503.687 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Lack of reliability when a task is changed on top of another task on top of another task.

513.689 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Isn't that exactly the difficulty with long horizon tasks?

519.677 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Is that like you have to do 10 things in a row or 100 things in a row and diminishing the reliability of any one of them?

522.22 View full episode →

Dwarkesh Podcast

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind

Or the probability goes down from 99.99 to 99.9.

528.968 View full episode →

← Previous Page 41 of 79 Next →

Report any issue