Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

BlueDot Narrated

Technology News Society & Culture

Activity Overview

Episode publication activity over the past year

Episodes

Showing 101-200 of 207
«« ← Prev Page 2 of 3 Next → »»

Visualizing the Deep Learning Revolution

04 Jan 2025

Contributed by Lukas

The field of AI has undergone a revolution over the last decade, driven by the success of deep learning techniques. This post aims to convey three ide...

Intelligence Explosion: Evidence and Import

04 Jan 2025

Contributed by Lukas

It seems unlikely that humans are near the ceiling of possible intelligences, rather than simply being the first such intelligence that happened to ev...

On the Opportunities and Risks of Foundation Models

04 Jan 2025

Contributed by Lukas

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a w...

Machine Learning for Humans: Supervised Learning

04 Jan 2025

Contributed by Lukas

The two tasks of supervised learning: regression and classification. Linear regression, loss functions, and gradient descent.How much money will we ma...

Can We Scale Human Feedback for Complex AI Tasks?

04 Jan 2025

Contributed by Lukas

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique for steering large language models (LLMs) toward desired behavio...

Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

04 Jan 2025

Contributed by Lukas

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior...

Zoom In: An Introduction to Circuits

04 Jan 2025

Contributed by Lukas

By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks. Many important transition points in...

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

04 Jan 2025

Contributed by Lukas

Using a sparse autoencoder, we extract a large number of interpretable features from a one-layer transformer.Mechanistic interpretability seeks to und...

Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small

04 Jan 2025

Contributed by Lukas

Research in mechanistic interpretability seeks to explain behaviors of machine learning (ML) models in terms of their internal components. However, mo...

AI Watermarking Won’t Curb Disinformation

04 Jan 2025

Contributed by Lukas

Generative AI allows people to produce piles upon piles of images and words very quickly. It would be nice if there were some way to reliably distingu...

Introduction to Mechanistic Interpretability

04 Jan 2025

Contributed by Lukas

Our introduction introduces common mech interp concepts, to prepare you for the rest of this session's resources.Original text: https://aisafetyf...

We Need a Science of Evals

04 Jan 2025

Contributed by Lukas

This lays out a number of open questions, in what the author calls a 'Science of Evals'.Original text: https://www.apolloresearch.ai/blog/we...

Become a Person who Actually Does Things

04 Jan 2025

Contributed by Lukas

The next four weeks of the course are an opportunity for you to actually build a thing that moves you closer to contributing to AI Alignment, and we&a...

Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

04 Jan 2025

Contributed by Lukas

This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and hum...

Emerging Processes for Frontier AI Safety

04 Jan 2025

Contributed by Lukas

The UK recognises the enormous opportunities that AI can unlock across our economy and our society. However, without appropriate guardrails, such tech...

Constitutional AI Harmlessness from AI Feedback

04 Jan 2025

Contributed by Lukas

This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and hum...

Challenges in Evaluating AI Systems

04 Jan 2025

Contributed by Lukas

Most conversations around the societal impacts of artificial intelligence (AI) come down to discussing some quality of an AI system, such as its truth...

AI Control: Improving Safety Despite Intentional Subversion

04 Jan 2025

Contributed by Lukas

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes eve...

Computing Power and the Governance of AI

04 Jan 2025

Contributed by Lukas

This post summarises a new report, “Computing Power and the Governance of Artificial Intelligence.” The full report is a collaboration between nin...

Working in AI Alignment

04 Jan 2025

Contributed by Lukas

This guide is written for people who are considering direct work on technical AI alignment. I expect it to be most useful for people who are not yet w...

Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points

04 Jan 2025

Contributed by Lukas

We took 10 years of research and what we’ve learned from advising 1,000+ people on how to build high-impact careers, compressed that into an eight-w...

How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach

04 Jan 2025

Contributed by Lukas

I am approaching the end of my AI governance PhD, and I’ve spent about 2.5 years as a researcher at FHI. During that time, I’ve learnt a lot about...

Being the (Pareto) Best in the World

04 Jan 2025

Contributed by Lukas

This introduces the concept of Pareto frontiers. The top comment by Rob Miles also ties it to comparative advantage.While reading, consider what Paret...

Writing, Briefly

04 Jan 2025

Contributed by Lukas

(In the process of answering an email, I accidentally wrote a tiny essay about writing. I usually spend weeks on an essay. This one took 67 minutes—...

Public by Default: How We Manage Information Visibility at Get on Board

04 Jan 2025

Contributed by Lukas

I’ve been obsessed with managing information, and communications in a remote team since Get on Board started growing. Reducing the bus factor is a p...

How to Get Feedback

04 Jan 2025

Contributed by Lukas

Feedback is essential for learning. Whether you’re studying for a test, trying to improve in your work or want to master a difficult skill, you need...

Worst-Case Thinking in AI Alignment

04 Jan 2025

Contributed by Lukas

Alternative title: “When should you assume that what could go wrong, will go wrong?” Thanks to Mary Phuong and Ryan Greenblatt for helpful suggest...

Compute Trends Across Three Eras of Machine Learning

04 Jan 2025

Contributed by Lukas

This article explains key drivers of AI progress, explains how compute is calculated, as well as looks at how the amount of compute used to train AI m...

Empirical Findings Generalize Surprisingly Far

04 Jan 2025

Contributed by Lukas

Previously, I argued that emergent phenomena in machine learning mean that we can’t rely on current trends to predict what the future of ML will be ...

Low-Stakes Alignment

04 Jan 2025

Contributed by Lukas

Right now I’m working on finding a good objective to optimize with ML, rather than trying to make sure our models are robustly optimizing that objec...

Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions

04 Jan 2025

Contributed by Lukas

Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer o...

ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation

04 Jan 2025

Contributed by Lukas

This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors...

Imitative Generalisation (AKA ‘Learning the Prior’)

04 Jan 2025

Contributed by Lukas

This post tries to explain a simplified version of Paul Christiano’s mechanism introduced here, (referred to there as ‘Learning the Prior’) and ...

Toy Models of Superposition

04 Jan 2025

Contributed by Lukas

It would be very convenient if the individual neurons of artificial neural networks corresponded to cleanly interpretable features of the input. For e...

Discovering Latent Knowledge in Language Models Without Supervision

04 Jan 2025

Contributed by Lukas

Abstract: Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may rep...

An Investigation of Model-Free Planning

04 Jan 2025

Contributed by Lukas

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these ch...

Gradient Hacking: Definitions and Examples

04 Jan 2025

Contributed by Lukas

Gradient hacking is a hypothesized phenomenon where:A model has knowledge about possible training trajectories which isn’t being used by its trainin...

Intro to Brain-Like-AGI Safety

04 Jan 2025

Contributed by Lukas

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5)Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and...

Chinchilla’s Wild Implications

04 Jan 2025

Contributed by Lukas

This post is about language model scaling laws, specifically the laws derived in the DeepMind paper that introduced Chinchilla. The paper came out a f...

Deep Double Descent

04 Jan 2025

Contributed by Lukas

We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves a...

Eliciting Latent Knowledge

04 Jan 2025

Contributed by Lukas

In this post, we’ll present ARC’s approach to an open problem we think is central to aligning powerful machine learning (ML) systems: Suppose we ...

Illustrating Reinforcement Learning from Human Feedback (RLHF)

04 Jan 2025

Contributed by Lukas

This more technical article explains the motivations for a system like RLHF, and adds additional concrete details as to how the RLHF approach is appli...

This is How AI Will Transform How Science Gets Done

02 Jan 2025

Contributed by Lukas

This article by Eric Schmidt, former CEO of Google, explains existing use cases for AI in the scientific community and outlines ways that sufficiently...

If-Then Commitments for AI Risk Reduction

02 Jan 2025

Contributed by Lukas

This article from Holden Karnofsky, now a visiting scholar at the Carnegie Endowment for International Peace, discusses "If-Then" commitment...

So You Want to be a Policy Entrepreneur?

30 Dec 2024

Contributed by Lukas

This paper by academic Michael Mintrom defines policy entrepreneurs as "energetic actors who engage in collaborative efforts in and around govern...

Open-Sourcing Highly Capable Foundation Models: An Evaluation of Risks, Benefits, and Alternative Methods for Pursuing Open-Source Objectives

30 Dec 2024

Contributed by Lukas

This resource is the second of two on the benefits and risks of open-weights model release. In contrast, this paper expresses strong skepticism toward...

Considerations for Governing Open Foundation Models

30 Dec 2024

Contributed by Lukas

This resource is the first of two on the benefits and risks of open-weights model release. This paper broadly supports the open release of foundation ...

Driving U.S. Innovation in Artificial Intelligence: A Roadmap for Artificial Intelligence Policy in the United States Senate

22 May 2024

Contributed by Lukas

In the fall of 2023, the US Bipartisan Senate AI Working Group held insight forms with global leaders. Participants included the leaders of major AI l...

The AI Triad and What It Means for National Security Strategy

20 May 2024

Contributed by Lukas

In this paper from CSET, Ben Buchanan outlines a framework for understanding the inputs that power machine learning. Called "the AI Triad", ...

Societal Adaptation to Advanced AI

20 May 2024

Contributed by Lukas

This paper explores the under-discussed strategies of adaptation and resilience to mitigate the risks of advanced AI systems. The authors present argu...

OECD AI Principles

13 May 2024

Contributed by Lukas

This document from the OECD is split into two sections: principles for responsible stewardship of trustworthy AI & national policies and internati...

Key facts: UNESCO’s Recommendation on the Ethics of Artificial Intelligence

13 May 2024

Contributed by Lukas

This summary of UNESCO's Recommendation on the Ethics of AI outlines four core values, ten core principles, and eleven actionable policies for re...

The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023

13 May 2024

Contributed by Lukas

This statement was released by the UK Government as part of their Global AI Safety Summit from November 2023. It notes that frontier models pose uniqu...

A pro-innovation approach to AI regulation: government response

13 May 2024

Contributed by Lukas

This report by the UK's Department for Science, Technology, and Innovation outlines a regulatory framework for UK AI policy. Per the report, &quo...

China’s AI Regulations and How They Get Made

13 May 2024

Contributed by Lukas

This report from the Carnegie Endowment for International Peace summarizes Chinese AI policy as of mid-2023. It also provides analysis of the factors ...

High-level summary of the AI Act

13 May 2024

Contributed by Lukas

This primer by the Future of Life Institute highlights core elements of the EU AI Act. It includes a high level summary alongside explanations of diff...

FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence

13 May 2024

Contributed by Lukas

This fact sheet from The White House summarizes President Biden's AI Executive Order from October 2023. The President's AI EO represents the...

Recent U.S. Efforts on AI Policy

13 May 2024

Contributed by Lukas

This high-level overview by CISA summarizes major US policies on AI at the federal level. Important items worth further investigation include Executiv...

AI Index Report 2024, Chapter 7: Policy and Governance

13 May 2024

Contributed by Lukas

This yearly report from Stanford’s Center for Humane AI tracks AI governance actions and broader trends in policies and legislation by governments a...

The Policy Playbook: Building a Systems-Oriented Approach to Technology and National Security Policy

05 May 2024

Contributed by Lukas

This report by the Center for Security and Emerging Technology first analyzes the tensions and tradeoffs between three strategic technology and nation...

Strengthening Resilience to AI Risk: A Guide for UK Policymakers

04 May 2024

Contributed by Lukas

This report from the Centre for Emerging Technology and Security and the Centre for Long-Term Resilience identifies different levers as they apply to ...

The Convergence of Artificial Intelligence and the Life Sciences: Safeguarding Technology, Rethinking Governance, and Preventing Catastrophe

03 May 2024

Contributed by Lukas

This report by the Nuclear Threat Initiative primarily focuses on how AI's integration into biosciences could advance biotechnology but also pose...

What is AI Alignment?

01 May 2024

Contributed by Lukas

To solve rogue AIs, we’ll have to align them. In this article by Adam Jones of BlueDot Impact, Jones introduces the concept of aligning AIs. He defi...

Rogue AIs

01 May 2024

Contributed by Lukas

This excerpt from CAIS’s AI Safety, Ethics, and Society textbook provides a deep dive into the CAIS resource from session three, focusing specifical...

An Overview of Catastrophic AI Risks

29 Apr 2024

Contributed by Lukas

This article from the Center for AI Safety provides an overview of ways that advanced AI could cause catastrophe. It groups catastrophic risks into fo...

Future Risks of Frontier AI

23 Apr 2024

Contributed by Lukas

This report from the UK’s Government Office for Science envisions five potential risk scenarios from frontier AI. Each scenario includes information...

What risks does AI pose?

23 Apr 2024

Contributed by Lukas

This resource, written by Adam Jones at BlueDot Impact, provides a comprehensive overview of the existing and anticipated risks of AI. As you're ...

AI Could Defeat All Of Us Combined

22 Apr 2024

Contributed by Lukas

This blog post from Holden Karnofsky, Open Philanthropy’s Director of AI Strategy, explains how advanced AI might overpower humanity. It summarizes ...

The Economic Potential of Generative AI: The Next Productivity Frontier

16 Apr 2024

Contributed by Lukas

This report from McKinsey discusses the huge potential for economic growth that generative AI could bring, examining key drivers and exploring potenti...

Positive AI Economic Futures

16 Apr 2024

Contributed by Lukas

This insight report from the World Economic Forum summarizes some positive AI outcomes. Some proposed futures include AI enabling shared economic bene...

The Transformative Potential of Artificial Intelligence

16 Apr 2024

Contributed by Lukas

This paper by Ross Gruetzemacher and Jess Whittlestone examines the concept of transformative AI, which significantly impacts society without necessar...

Moore's Law for Everything

16 Apr 2024

Contributed by Lukas

This blog by Sam Altman, the CEO of OpenAI, provides insight into what AI company leaders are saying and thinking about their reasons for pursuing adv...

Visualizing the Deep Learning Revolution

13 May 2023

Contributed by Lukas

The field of AI has undergone a revolution over the last decade, driven by the success of deep learning techniques. This post aims to convey three ide...

A Short Introduction to Machine Learning

13 May 2023

Contributed by Lukas

Despite the current popularity of machine learning, I haven’t found any short introductions to it which quite match the way I prefer to introduce pe...

The AI Triad and What It Means for National Security Strategy

13 May 2023

Contributed by Lukas

A single sentence can summarize the complexities of modern artificial intelligence: Machine learning systems use computing power to execute algorithms...

Specification Gaming: The Flip Side of AI Ingenuity

13 May 2023

Contributed by Lukas

Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome. We have all had e...

As AI Agents Like Auto-GPT Speed up Generative AI Race, We All Need to Buckle Up

13 May 2023

Contributed by Lukas

If you thought the pace of AI development had sped up since the release of ChatGPT last November, well, buckle up. Thanks to the rise of autonomous AI...

The Need for Work on Technical AI Alignment

13 May 2023

Contributed by Lukas

This page gives an overview of the alignment problem. It describes our motivation for running courses about technical AI alignment. The terminology sh...

Overview of How AI Might Exacerbate Long-Running Catastrophic Risks

13 May 2023

Contributed by Lukas

Developments in AI could exacerbate long-running catastrophic risks, including bioterrorism, disinformation and resulting institutional dysfunction, m...

Avoiding Extreme Global Vulnerability as a Core AI Governance Problem

13 May 2023

Contributed by Lukas

Much has been written framing and articulating the AI governance problem from a catastrophic risks lens, but these writings have been scattered. This ...

AI Safety Seems Hard to Measure

13 May 2023

Contributed by Lukas

In previous pieces, I argued that there’s a real and large risk of AI systems’ developing dangerous goals of their own and defeating all of humani...

Nobody’s on the Ball on AGI Alignment

13 May 2023

Contributed by Lukas

Observing from afar, it’s easy to think there’s an abundance of people working on AGI safety. Everyone on your timeline is fretting about AI risk,...

Why Might Misaligned, Advanced AI Cause Catastrophe?

13 May 2023

Contributed by Lukas

You may have seen arguments (such as these) for why people might create and deploy advanced AI that is both power-seeking and misaligned with human in...

Emergent Deception and Emergent Optimization

13 May 2023

Contributed by Lukas

I’ve previously argued that machine learning systems often exhibit emergent capabilities, and that these capabilities could lead to unintended negat...

Frontier AI Regulation: Managing Emerging Risks to Public Safety

13 May 2023

Contributed by Lukas

Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper...

Model Evaluation for Extreme Risks

13 May 2023

Contributed by Lukas

Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in A...

Primer on Safety Standards and Regulations for Industrial-Scale AI Development

13 May 2023

Contributed by Lukas

This primer introduces various aspects of safety standards and regulations for industrial-scale AI development: what they are, their potential and lim...

Racing Through a Minefield: The AI Deployment Problem

13 May 2023

Contributed by Lukas

Push AI forward too fast, and catastrophe could occur. Too slow, and someone else less cautious could do it. Is there a safe course?Source:https://www...

Choking off China’s Access to the Future of AI

13 May 2023

Contributed by Lukas

Introduction On October 7, 2022, the Biden administration announced a new export controls policy on artificial intelligence (AI) and semiconductor tec...

Primer on AI Chips and AI Governance

13 May 2023

Contributed by Lukas

If governments could regulate the large-scale use of “AI chips,” that would likely enable governments to govern frontier AI development—to decid...

The State of AI in Different Countries — An Overview

13 May 2023

Contributed by Lukas

Some are concerned that regulating AI progress in one country will slow that country down, putting it at a disadvantage in a global AI arms race. Many...

What Does It Take to Catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring

13 May 2023

Contributed by Lukas

As advanced machine learning systems’ capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that...

A Tour of Emerging Cryptographic Technologies

13 May 2023

Contributed by Lukas

Historically, progress in the field of cryptography has been enormously consequential. Over the past century, for instance, cryptographic discoveries ...

Historical Case Studies of Technology Governance and International Agreements

13 May 2023

Contributed by Lukas

The following excerpts summarize historical case studies that are arguably informative for AI governance. The case studies span nuclear arms control, ...

12 Tentative Ideas for Us AI Policy

13 May 2023

Contributed by Lukas

About two years ago, I wrote that “it’s difficult to know which ‘intermediate goals’ [e.g. policy goals] we could pursue that, if achieved, wo...

Let’s Think About Slowing Down AI

13 May 2023

Contributed by Lukas

If you fear that someone will build a machine that will seize control of the world and annihilate humanity, then one kind of response is to try to bui...

What AI Companies Can Do Today to Help With the Most Important Century

13 May 2023

Contributed by Lukas

I’ve been writing about tangible things we can do today to help the most important century go well. Previously, I wrote about helpful messages to sp...

OpenAI Charter

13 May 2023

Contributed by Lukas

Our Charter describes the principles we use to execute on OpenAI’s mission. ---Source: https://openai.com/charter--- Narrated by TYPE III AUDIO.A ...

LP Announcement by OpenAI

13 May 2023

Contributed by Lukas

We’ve created OpenAI LP, a new “capped-profit” company that allows us to rapidly increase our investments in compute and talent while including ...

International Institutions for Advanced AI

13 May 2023

Contributed by Lukas

International institutions may have an important role to play in ensuring advanced AI systems benefit humanity. International collaborations can unloc...

«« ← Prev Page 2 of 3 Next → »»