Ryan Greenblatt

"My picture of the present in AI" by ryan_greenblatt

However, it seems possible that most people at GDM are actually using anthropic models as part of a compute deal which could make their speedup be substantially larger.

287.564 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

While the serial engineering speedup is 1.6x, the overall speedup to AI progress is much smaller, more like 1.15x or 1.2x.

297.183 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Because engineering is only a subset of the relevant labor, labor is only one input to algorithmic progress, compute for experiments is another, and algorithmic progress itself is only one component, though probably the majority.

307.62 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

perhaps around 60% or 80% of overall AI progress, scaling up training compute and spending more on data also contribute.

319.66 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

AIs are able to automate increasingly large and difficult tasks.

334.021 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

The old meter time horizon benchmark has mostly saturated when it comes to measuring 50% reliability time horizon, as in, scores are sufficiently high this measurement is unreliable, but at 80% reliability the best publicly deployed models are at a bit over an hour while I expect the best internal models are reaching a bit below two hours.

338.689 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

I expect that increasingly this 80% reliability score is dominated by relatively niche tasks that don't centrally reflect automating software engineering or AI R&D.

358.102 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Further, the time horizon measurement is increasingly sensitive to the task distribution.

368.868 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

On tasks that are easy and cheap to verify, AIs can often complete difficult tasks that would take the best human experts many months and in some cases years.

374.395 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

This requires somewhat custom scaffolding, large amounts of inference compute, though still much less than human cost for the same task, and relies on the AIs being able to just keep making forward progress and checking whether they've succeeded.

383.846 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Even though AIs make big errors during this process and sometimes end up severely mistaken about what's going on, they can recover by just seeing what isn't working and looking into this.

396.661 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

When they fail to complete tasks, this is often because the task requires addition or legitimately very complex methods that are hard to build in an incremental and sloppy way.

408.154 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

The more the task is just a relatively straightforward but extremely large engineering project, the better AIs do.

417.944 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Often, they also fail just by not trying hard enough or giving up on something they shouldn't give up on.

425.534 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Because current RL isn't very well targeted towards getting AIs to operate effectively in these massive inference compute scaffolds, AIs have somewhat degenerate tendencies in these scaffolds, for example.

431.605 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Getting into attractor states where they become convinced of some false belief, for example that something isn't possible, and being bad at delegating to sub-agents, for instance, giving overly specific instructions based on guessing from limited context rather than letting the sub-agent figure things out or assuming context the sub-agent doesn't have.

442.645 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Reward hacking and similar tendencies caused by bad RL incentives, for example agents giving up on some tasks they were assigned and making up some excuse for why it isn't feasible, amplify these issues, though reward hacks often get fixed via having agents iteratively inspect the work, but sometimes they persist, with all the agents claiming the reward hack is okay or can't be removed even though they know it's cheating or unintended at some level.

460.921 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Adding a human, even a human with minimal context, to the loop can help substantially by noticing and correcting some of these issues as well as making it easier to apply more inference compute without needing more infrastructure scaffolding, for example by doing multiple runs in parallel and picking the best one or picking the one that didn't reward hack.

485.457 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Relative to benchmarks and easy and cheap-to-verify tasks, AIs do worse on randomly sampled engineering tasks from within AI companies.

504.458 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

This is especially true if we wait by value or undo a recent shift towards doing more work that AIs are especially good at.

513.82 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment