Ryan Greenblatt

"My picture of the present in AI" by ryan_greenblatt

This is mostly coming from misalignment that emerges in the context of massive long-running agent scaffolds, something like the memetic spread threat model discussed here.

922.255 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

More speculative.

931.204 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

My current sense is that AI companies overall probably have an overly optimistic sense of how good of a job they've done on mundane alignment while the teams working on the issue have a mostly reasonable view of this.

932.797 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

This seems especially true for Anthropic.

944.212 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

This is due to a mix of AIs, especially Anthropic Systems, acting like sympathetic characters and very plausibly being sympathetic characters and motivated reasoning about the company doing well in general.

947.356 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

I think it's likely, 60%, that in the next six months a very well set up and somewhat hand-engineered agent scaffold that uses the best AI could succeed in fully autonomously creating a strong end-to-end exploit against one of the top 10 most important software targets, for example Chrome OneClick, Safari OneClick, iMessage ZeroClick, etc.,

968.16 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

when given $1 million in inference compute per target.

987.539 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

This assumes there aren't issues with refusals, for example the AI is helpful only, and that this AI is given this task before this AI is used to patch the relevant software.

991.746 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

My largest uncertainty here is around how effectively software will get patched by earlier AIs.

1002.222 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

I'm uncertain about how much effort will be spent on leveraging AIs to find and patch vulnerabilities.

1008.38 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

I'm also uncertain about the extent to which patching vulnerabilities found by earlier models will transfer to preventing somewhat more capable models, possibly with more inference compute, from finding vulnerabilities.

1014.493 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

More strongly, I think that AIs in the next six months are quite likely, 80%, to be able to succeed at this objective for a January 2026 version of the corresponding software without internet access and assuming no contamination.

1026.098 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Many difficult parts of cyber offence seem particularly well suited to AI strengths, relatively checkable, benefits from extensive knowledge, parts are highly parallelisable.

1041.08 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

I don't think the rate of cybercrime is elevated right now, though the rate of vulnerability discovery is very elevated.

1051.175 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

I don't currently expect a very large increase in cybercrime by end of year, though I think it's possible and a 2x increase is quite plausible, roughly 30%.

1058.248 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

I expect the situation with AI cyber capabilities will seem extreme to security professionals and to maintainers of commonly used software that tries to be secure, for example Chrome, Linux, etc., but will have almost no direct effect on random people in the US and won't even have much effect on software engineers at big tech companies.

1067.965 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Wannabe bioterrorists without much bio-expertise who are very good at using LLMs are probably seriously uplifted by unsafeguarded versions of the current best AIs, as in, helpful-only models, but no one knows how large this effect is and how good at using LLMs you need to be.

1090.535 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

After taking into account safeguards, I don't think current publicly released LLMs, as of April 1, 2026, have more than double bioterror risk, though I'm pretty unsure.

1107.233 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

Also, even a 2x increase would be from a relatively low baseline.

1118.776 View full episode →

LessWrong (Curated & Popular)

"My picture of the present in AI" by ryan_greenblatt

We don't have a great sense of what this baseline is in terms of expected fatalities, though we can bound the frequency of bioterror attempts reasonably well.

1123.585 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment