AXRP - the AI X-risk Research Podcast
Episodes
46 - Tom Davidson on AI-enabled Coups
07 Aug 2025
Contributed by Lukas
Could AI enable a small group to gain power over a large country, and lock in their power permanently? Often, people worried about catastrophic risks ...
45 - Samuel Albanie on DeepMind's AGI Safety Approach
06 Jul 2025
Contributed by Lukas
In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". ...
44 - Peter Salib on AI Rights for Human Safety
28 Jun 2025
Contributed by Lukas
In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, a...
43 - David Lindner on Myopic Optimization with Non-myopic Approval
15 Jun 2025
Contributed by Lukas
In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward ...
42 - Owain Evans on LLM Psychology
06 Jun 2025
Contributed by Lukas
Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligne...
41 - Lee Sharkey on Attribution-based Parameter Decomposition
03 Jun 2025
Contributed by Lukas
What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms w...
40 - Jason Gross on Compact Proofs and Interpretability
28 Mar 2025
Contributed by Lukas
How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing....
38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
01 Mar 2025
Contributed by Lukas
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not front...
38.7 - Anthony Aguirre on the Future of Life Institute
09 Feb 2025
Contributed by Lukas
The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the A...
38.6 - Joel Lehman on Positive Visions of AI
24 Jan 2025
Contributed by Lukas
Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can?...
38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
20 Jan 2025
Contributed by Lukas
Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look...
38.4 - Shakeel Hashim on AI Journalism
05 Jan 2025
Contributed by Lukas
AI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this epi...
38.3 - Erik Jenner on Learned Look-Ahead
12 Dec 2024
Contributed by Lukas
Lots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neur...
39 - Evan Hubinger on Model Organisms of Misalignment
01 Dec 2024
Contributed by Lukas
The 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to unders...
38.2 - Jesse Hoogland on Singular Learning Theory
27 Nov 2024
Contributed by Lukas
You may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode,...
38.1 - Alan Chan on Agent Infrastructure
16 Nov 2024
Contributed by Lukas
Road lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan ta...
38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
14 Nov 2024
Contributed by Lukas
Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society...
37 - Jaime Sevilla on AI Forecasting
04 Oct 2024
Contributed by Lukas
Epoch AI is the premier organization that tracks the trajectory of AI - how much compute is used, the role of algorithmic improvements, the growth in ...
36 - Adam Shai and Paul Riechers on Computational Mechanics
29 Sep 2024
Contributed by Lukas
Sometimes, people talk about transformers as having "world models" as a result of being trained to predict text data on the internet. But what does th...
New Patreon tiers + MATS applications
28 Sep 2024
Contributed by Lukas
Patreon: https://www.patreon.com/axrpodcast MATS: https://www.matsprogram.org Note: I'm employed by MATS, but they're not paying me to make this video...
35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
24 Aug 2024
Contributed by Lukas
How do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit t...
34 - AI Evaluations with Beth Barnes
28 Jul 2024
Contributed by Lukas
How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misb...
33 - RLHF Problems with Scott Emmons
12 Jun 2024
Contributed by Lukas
Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have...
32 - Understanding Agency with Jan Kulveit
30 May 2024
Contributed by Lukas
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about ...
31 - Singular Learning Theory with Daniel Murfet
07 May 2024
Contributed by Lukas
What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Baye...
30 - AI Security with Jeffrey Ladish
30 Apr 2024
Contributed by Lukas
Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How ca...
29 - Science of Deep Learning with Vikrant Varma
25 Apr 2024
Contributed by Lukas
In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having t...
28 - Suing Labs for AI Risk with Gabriel Weil
17 Apr 2024
Contributed by Lukas
How should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is develop...
27 - AI Control with Buck Shlegeris and Ryan Greenblatt
11 Apr 2024
Contributed by Lukas
A lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world---or in other words, ...
26 - AI Governance with Elizabeth Seger
26 Nov 2023
Contributed by Lukas
The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democ...
25 - Cooperative AI with Caspar Oesterheld
03 Oct 2023
Contributed by Lukas
Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage th...
24 - Superalignment with Jan Leike
27 Jul 2023
Contributed by Lukas
Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top research...
23 - Mechanistic Anomaly Detection with Mark Xu
27 Jul 2023
Contributed by Lukas
Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark X...
Survey, store closing, Patreon
28 Jun 2023
Contributed by Lukas
Very brief survey: bit.ly/axrpsurvey2023 Store is closing in a week! Link: store.axrp.net/ Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast
22 - Shard Theory with Quintin Pope
15 Jun 2023
Contributed by Lukas
What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look...
21 - Interpretability for Engineers with Stephen Casper
02 May 2023
Contributed by Lukas
Lots of people in the field of machine learning study 'interpretability', developing tools that they say give us useful information about neural netwo...
20 - 'Reform' AI Alignment with Scott Aaronson
12 Apr 2023
Contributed by Lukas
How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with...
Store, Patreon, Video
07 Feb 2023
Contributed by Lukas
Store: https://store.axrp.net/ Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Video: https://www.youtube.com/watch?v=...
19 - Mechanistic Interpretability with Neel Nanda
04 Feb 2023
Contributed by Lukas
How good are we at understanding the internal computation of advanced machine learning models, and do we have a hope at getting better? In this episod...
New podcast - The Filan Cabinet
13 Oct 2022
Contributed by Lukas
I have a new podcast, where I interview whoever I want about whatever I want. It's called "The Filan Cabinet", and you can find it wherever you listen...
18 - Concept Extrapolation with Stuart Armstrong
03 Sep 2022
Contributed by Lukas
Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending ...
17 - Training for Very High Reliability with Daniel Ziegler
21 Aug 2022
Contributed by Lukas
Sometimes, people talk about making AI systems safe by taking examples where they fail and training them to do well on those. But how can we actually ...
16 - Preparing for Debate AI with Geoffrey Irving
01 Jul 2022
Contributed by Lukas
Many people in the AI alignment space have heard of AI safety via debate - check out AXRP episode 6 (axrp.net/episode/2021/04/08/episode-6-debate-beth...
15 - Natural Abstractions with John Wentworth
23 May 2022
Contributed by Lukas
Why does anybody care about natural abstractions? Do they somehow relate to math, or value learning? How do E. coli bacteria find sources of sugar? Al...
14 - Infra-Bayesian Physicalism with Vanessa Kosoy
05 Apr 2022
Contributed by Lukas
Late last year, Vanessa Kosoy and Alexander Appel published some research under the heading of "Infra-Bayesian physicalism". But wait - what was infra...
13 - First Principles of AGI Safety with Richard Ngo
31 Mar 2022
Contributed by Lukas
How should we think about artificial general intelligence (AGI), and the risks it might pose? What constraints exist on technical solutions to the pro...
12 - AI Existential Risk with Paul Christiano
02 Dec 2021
Contributed by Lukas
Why would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christ...
11 - Attainable Utility and Power with Alex Turner
25 Sep 2021
Contributed by Lukas
Many scary stories about AI involve an AI system deceiving and subjugating humans in order to gain the ability to achieve its goals without us stoppin...
10 - AI's Future and Impacts with Katja Grace
23 Jul 2021
Contributed by Lukas
When going about trying to ensure that AI does not cause an existential catastrophe, it's likely important to understand how AI will develop in the fu...
9 - Finite Factored Sets with Scott Garrabrant
24 Jun 2021
Contributed by Lukas
Being an agent can get loopy quickly. For instance, imagine that we're playing chess and I'm trying to decide what move to make. Your next move influe...
8 - Assistance Games with Dylan Hadfield-Menell
08 Jun 2021
Contributed by Lukas
How should we think about the technical problem of building smarter-than-human AI that does what we want? When and how should AI systems defer to us? ...
7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra
28 May 2021
Contributed by Lukas
If you want to shape the development and forecast the consequences of powerful AI technology, it's important to know when it might appear. In this epi...
7 - Side Effects with Victoria Krakovna
14 May 2021
Contributed by Lukas
One way of thinking about how AI might pose an existential threat is by taking drastic actions to maximize its achievement of some objective function,...
6 - Debate and Imitative Generalization with Beth Barnes
08 Apr 2021
Contributed by Lukas
One proposal to train AIs that can be useful is to have ML models debate each other about the answer to a human-provided question, where the human jud...
5 - Infra-Bayesianism with Vanessa Kosoy
10 Mar 2021
Contributed by Lukas
The theory of sequential decision-making has a problem: how can we deal with situations where we have some hypotheses about the environment we're acti...
4 - Risks from Learned Optimization with Evan Hubinger
17 Feb 2021
Contributed by Lukas
In machine learning, typically optimization is done to produce a model that performs well according to some metric. Today's episode features Evan Hubi...
3 - Negotiable Reinforcement Learning with Andrew Critch
11 Dec 2020
Contributed by Lukas
In this episode, I talk with Andrew Critch about negotiable reinforcement learning: what happens when two people (or organizations, or what have you) ...
2 - Learning Human Biases with Rohin Shah
11 Dec 2020
Contributed by Lukas
One approach to creating useful AI systems is to watch humans doing a task, infer what they're trying to do, and then try to do that well. The simples...
1 - Adversarial Policies with Adam Gleave
11 Dec 2020
Contributed by Lukas
In this episode, Adam Gleave and I talk about adversarial policies. Basically, in current reinforcement learning, people train agents that act in some...