Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

EA Forum Podcast (Curated & popular)

“Mitigating Risks from Rouge AI” by poppinfresh

01 Apr 2025

Description

Introduction Misaligned AI systems, which have a tendency to use their capabilities in ways that conflict with the intentions of both developers and users, could cause significant societal harm. Identifying them is seen as increasingly important to inform development and deployment decisions and design mitigation measures. There are concerns, however, that this will prove challenging. For example, misaligned AIs may only reveal harmful behaviors in rare circumstances, or perceive detection attempts as threatening and deploy countermeasures – including deception and sandbagging – to evade them. For these reasons, developing a range of efforts to detect misaligned behavior, including power-seeking, deception, and sandbagging, among other capabilities, have been proposed. One important indicator, though, has been hiding in plain sight for years. In this post, we identify an underappreciated method that may be both necessary and sufficient to identify misaligned AIs: whether or not they've turned red, i.e. gone rouge. In [...] ---Outline:(00:10) Introduction(01:42) Historical Evidence for Rouge AI(02:36) Recent Empirical Work(03:46) Potential Countermeasure(03:49) The EYES Eval(04:55) EYES Eval Demonstration(05:44) Future Research Directions(06:46) Conclusion --- First published: April 1st, 2025 Source: https://forum.effectivealtruism.org/posts/uKKoj9iqj2cWKsjrt/mitigating-risks-from-rouge-ai --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.