LessWrong (Curated & Popular)
Episodes
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
04 Sep 2023
Contributed by Lukas
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy&q...
"Dear Self; we need to talk about ambition" by Elizabeth
30 Aug 2023
Contributed by Lukas
I keep seeing advice on ambition, aimed at people in college or early in their career, that would have been really bad for me at similar ages. Rather ...
"Assume Bad Faith" by Zack_M_Davis
28 Aug 2023
Contributed by Lukas
I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the...
"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon
28 Aug 2023
Contributed by Lukas
The Carving of Reality, third volume of the Best of LessWrong books is now available on Amazon (US).The Carving of Reality includes 43 essays from 29 ...
"Large Language Models will be Great for Censorship" by Ethan Edwards
23 Aug 2023
Contributed by Lukas
LLMs can do many incredible things. They can generate unique creative content, carry on long conversations in any number of subjects, complete complex...
"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov
22 Aug 2023
Contributed by Lukas
Intro: I am a psychotherapist, and I help people working on AI safety. I noticed patterns of mental health issues highly specific to this group. It&ap...
"Ten Thousand Years of Solitude" by agp
22 Aug 2023
Contributed by Lukas
This is a linkpost for the article "Ten Thousand Years of Solitude", written by Jared Diamond for Discover Magazine in 1993, four years befo...
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
21 Aug 2023
Contributed by Lukas
I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't und...
"Inflection.ai is a major AGI lab" by Nikola
15 Aug 2023
Contributed by Lukas
Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, Dee...
"Feedbackloop-first Rationality" by Raemon
15 Aug 2023
Contributed by Lukas
I've been workshopping a new rationality training paradigm. (By "rationality training paradigm", I mean an approach to learning/teachin...
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
09 Aug 2023
Contributed by Lukas
TL;DR: This document lays out the case for research on “model organisms of misalignment” – in vitro demonstrations of the kinds of failures that...
"When can we trust model evaluations?" bu evhub
09 Aug 2023
Contributed by Lukas
In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to re...
"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes
04 Aug 2023
Contributed by Lukas
Blogpost versionPaperWe have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire reso...
"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long
04 Aug 2023
Contributed by Lukas
Summary of Argument: The public debate among AI experts is confusing because there are, to a first approximation, three sides, not two sides to the de...
"My current LK99 questions" by Eliezer Yudkowsky
04 Aug 2023
Contributed by Lukas
So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors a...
"Thoughts on sharing information about language model capabilities" by paulfchristiano
02 Aug 2023
Contributed by Lukas
I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduc...
"Cultivating a state of mind where new ideas are born" by Henrik Karlsson
31 Jul 2023
Contributed by Lukas
In the early 2010s, a popular idea was to provide coworking spaces and shared living to people who were building startups. That way the founders would...
"Self-driving car bets" by paulfchristiano
31 Jul 2023
Contributed by Lukas
This month I lost a bunch of bets.Back in early 2016 I bet at even odds that self-driving ride sharing would be available in 10 US cities by July 2023...
"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth
31 Jul 2023
Contributed by Lukas
Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That’...
"Grant applications and grand narratives" by Elizabeth
28 Jul 2023
Contributed by Lukas
The Lightspeed application asks: “What impact will [your project] have on the world? What is your project’s goal, how will you know if you’ve ...
"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel
28 Jul 2023
Contributed by Lukas
Previously Jacob Cannell wrote the post "Brain Efficiency" which makes several radical claims: that the brain is at the pareto frontier of s...
"Rationality !== Winning" by Raemon
28 Jul 2023
Contributed by Lukas
I think "Rationality is winning" is a bit of a trap. (The original phrase is notably "rationality is systematized winning", which...
"Cryonics and Regret" by MvB
28 Jul 2023
Contributed by Lukas
This post is not about arguments in favor of or against cryonics. I would just like to share a particular emotional response of mine as the topic beca...
"Unifying Bargaining Notions (2/2)" by Diffractor
12 Jun 2023
Contributed by Lukas
Alright, time for the payoff, unifying everything discussed in the previous post. This post is a lot more mathematically dense, you might want to dige...
"The ants and the grasshopper" by Richard Ngo
06 Jun 2023
Contributed by Lukas
Inspired by Aesop, Soren Kierkegaard, Robin Hanson, sadoeuphemist and Ben Hoffman.One winter a grasshopper, starving and frail, approaches a colony of...
"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.
18 May 2023
Contributed by Lukas
Summary: We demonstrate a new scalable way of interacting with language models: adding certain activation vectors into forward passes. Essentially, we...
"An artificially structured argument for expecting AGI ruin" by Rob Bensinger
16 May 2023
Contributed by Lukas
Philosopher David Chalmers asked: "Is there a canonical source for "the argument for AGI ruin" somewhere, preferably laid out as an exp...
"How much do you believe your results?" by Eric Neyman
10 May 2023
Contributed by Lukas
You are the director of a giant government research program that’s conducting randomized controlled trials (RCTs) on two thousand health interventio...
"Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)" by Chris Scammell & DivineMango
27 Apr 2023
Contributed by Lukas
This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintai...
"On AutoGPT" by Zvi
19 Apr 2023
Contributed by Lukas
The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all...
"GPTs are Predictors, not Imitators" by Eliezer Yudkowsky
12 Apr 2023
Contributed by Lukas
(Related text posted to Twitter; this version is edited and has a more advanced final section.)Imagine yourself in a box, trying to predict the next w...
"A stylized dialogue on John Wentworth's claims about markets and optimization" by Nate Soares
05 Apr 2023
Contributed by Lukas
https://www.lesswrong.com/posts/fJBTRa7m7KnCDdzG5/a-stylized-dialogue-on-john-wentworth-s-claims-about-markets(This is a stylized version of a real co...
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
05 Apr 2023
Contributed by Lukas
https://www.lesswrong.com/posts/iy2o4nQj9DnQD7Yhj/discussion-with-nate-soares-on-a-key-alignment-difficultyCrossposted from the AI Alignment Forum. Ma...
"Deep Deceptiveness" by Nate Soares
05 Apr 2023
Contributed by Lukas
https://www.lesswrong.com/posts/XWwvwytieLtEWaFJX/deep-deceptivenessThis post is an attempt to gesture at a class of AI notkilleveryoneism (alignment)...
"The Onion Test for Personal and Institutional Honesty" by Chana Messinger & Andrew Critch
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/nTGEeRSZrfPiJwkEc/the-onion-test-for-personal-and-institutional-honesty[co-written by Chana Messinger and Andrew Critc...
"There’s no such thing as a tree (phylogenetically)" by Eukaryote
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/fRwdkop6tyhi3d22L/there-s-no-such-thing-as-a-tree-phylogeneticallyThis is a linkpost for https://eukaryotewritesblog.c...
"Losing the root for the tree" by Adam Zerner
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/ma7FSEtumkve8czGF/losing-the-root-for-the-treeYou know that being healthy is important. And that there's a lot of...
"It Looks Like You’re Trying To Take Over The World" by Gwern
28 Mar 2023
Contributed by Lukas
https://gwern.net/fiction/clippyIn A.D. 20XX. Work was beginning. “How are you gentlemen !!”… (Work. Work never changes; work is always hell.)Sp...
"Why I think strong general AI is coming soon" by Porby
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-think-strong-general-ai-is-coming-soonI think there is little time left before someone builds ...
"What failure looks like" by Paul Christiano
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-likeCrossposted from the AI Alignment Forum. May contain more technical jargon th...
"Lies, Damn Lies, and Fabricated Options" by Duncan Sabien
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-optionsThis is an essay about one of those "once you see it, you ...
""Carefully Bootstrapped Alignment" is organizationally hard" by Raemon
21 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hardIn addition to technical challenges, plans ...
"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes
21 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/4Gt42jX7RiaNaxCwP/more-information-about-the-dangerous-capability-evaluationsCrossposted from the AI Alignment Forum. ...
"Enemies vs Malefactors" by Nate Soares
14 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactorsStatus: some mix of common wisdom (that bears repeating in our particular cont...
"The Parable of the King and the Random Process" by moridinamael
14 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/LzQtrHSYDafXynofq/the-parable-of-the-king-and-the-random-process~ A Parable of Forecasting Under Model Uncertainty ~Yo...
"The Waluigi Effect (mega-post)" by Cleo Nardo
08 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-postIn this article, I will present a mechanistic explanation of the Waluigi...
"Acausal normalcy" by Andrew Critch
06 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/3RSq3bfnzuL3sp46J/acausal-normalcyCrossposted from the AI Alignment Forum. May contain more technical jargon than usua...
"Please don't throw your mind away" by TsviBT
01 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/RryyWNmJNnLowbhfC/please-don-t-throw-your-mind-away[Warning: the following dialogue contains an incidental spoiler for...
"Cyborgism" by Nicholas Kees & Janus
15 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgismThere is a lot of disagreement and confusion about the feasibility and risks associated wit...
"Childhoods of exceptional people" by Henrik Karlsson
14 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-peopleThis is a linkpost for https://escapingflatland.substack.com/p/child...
"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares
13 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/NJYmovr9ZZAyyTBwM/what-i-mean-by-alignment-is-in-large-part-about-makingCrossposted from the AI Alignment Forum. May c...
"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça
10 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/NRrbJJWnaSorrqvtZ/on-not-getting-contaminated-by-the-wrong-obesity-ideasA Chemical Hunger (a), a series by the authors...
"SolidGoldMagikarp (plus, prompt generation)"
08 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generationWork done at SERI-MATS, over the past two months, by Jessica...
"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares
03 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-sWriting down something I’ve found myself repe...
"Basics of Rationalist Discourse" by Duncan Sabien
02 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/XPv4sYrKnPzeJASuk/basics-of-rationalist-discourse-1IntroductionThis post is meant to be a linkable resource. Its core ...
"My Model Of EA Burnout" by Logan Strohl
31 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/pDzdb4smpzT3Lwbym/my-model-of-ea-burnout(Probably somebody else has said most of this. But I personally haven't r...
"Sapir-Whorf for Rationalists" by Duncan Sabien
31 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/PCrTQDbciG4oLgmQ5/sapir-whorf-for-rationalistsCasus Belli: As I was scanning over my (rather long) list of essays-to-w...
"The Social Recession: By the Numbers" by Anton Stjepan Cebalo
25 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/Xo7qmDakxiizG7B9c/the-social-recession-by-the-numbersThis is a linkpost for https://novum.substack.com/p/social-recess...
"Recursive Middle Manager Hell" by Raemon
24 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/pHfPvb4JMhGDr4B7n/recursive-middle-manager-hellI think Zvi's Immoral Mazes sequence is really important, but come...
"The Feeling of Idea Scarcity" by John Wentworth
12 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/mfPHTWsFhzmcXw8ta/the-feeling-of-idea-scarcityHere’s a story you may recognize. There's a bright up-and-coming ...
"Models Don't 'Get Reward'" by Sam Ringer
12 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-rewardCrossposted from the AI Alignment Forum. May contain more technical jargon th...
"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin
12 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-withoutCrossposted from the AI Alignment Forum. ...
"The next decades might be wild" by Marius Hobbhahn
21 Dec 2022
Contributed by Lukas
https://www.lesswrong.com/posts/qRtD4WqKRYEtT5pi3/the-next-decades-might-be-wildCrossposted from the AI Alignment Forum. May contain more technical ja...
"Lessons learned from talking to >100 academics about AI safety" by Marius Hobbhahn
17 Nov 2022
Contributed by Lukas
https://www.lesswrong.com/posts/SqjQFhn5KTarfW8v7/lessons-learned-from-talking-to-greater-than-100-academicsCrossposted from the AI Alignment Forum. M...
"How my team at Lightcone sometimes gets stuff done" by jacobjacob
10 Nov 2022
Contributed by Lukas
https://www.lesswrong.com/posts/6LzKRP88mhL9NKNrS/how-my-team-at-lightcone-sometimes-gets-stuff-doneDisclaimer: I originally wrote this as a private d...
"Decision theory does not imply that we get to have nice things" by So8res
08 Nov 2022
Contributed by Lukas
https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-niceCrossposted from the AI Alignment Forum. May ...
"What 2026 looks like" by Daniel Kokotajlo
07 Nov 2022
Contributed by Lukas
https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like#2022Crossposted from the AI Alignment Forum. May contain more technical jargon ...
Counterarguments to the basic AI x-risk case
04 Nov 2022
Contributed by Lukas
"Introduction to abstract entropy" by Alex Altair
29 Oct 2022
Contributed by Lukas
https://www.lesswrong.com/posts/REA49tL5jsh69X3aM/introduction-to-abstract-entropy#fnrefpi8b39u5hd7This post, and much of the following sequence, was ...
"Consider your appetite for disagreements" by Adam Zerner
25 Oct 2022
Contributed by Lukas
https://www.lesswrong.com/posts/8vesjeKybhRggaEpT/consider-your-appetite-for-disagreementsPokerThere was a time about five years ago where I was tryin...
"My resentful story of becoming a medical miracle" by Elizabeth
21 Oct 2022
Contributed by Lukas
https://www.lesswrong.com/posts/fFY2HeC9i2Tx8FEnK/my-resentful-story-of-becoming-a-medical-miracleThis is a linkpost for https://acesounderglass.com/2...
"The Redaction Machine" by Ben
02 Oct 2022
Contributed by Lukas
https://www.lesswrong.com/posts/CKgPFHoWFkviYz7CB/the-redaction-machineOn the 3rd of October 2351 a machine flared to life. Huge energies coursed into...
"Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" by Ajeya Cotra
27 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/pRkFkzwKZ2zfa3R6H/without-specific-countermeasures-the-easiest-path-toCrossposted from the AI Alignment Forum. May con...
"The shard theory of human values" by Quintin Pope & TurnTrout
22 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-valuesTL;DR: We propose a theory of human value formation. According to th...
"Two-year update on my personal AI timelines" by Ajeya Cotra
22 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines#fnref-fwwPpQFdWM6hJqwuY-12Crossposted from the AI Alignm...
"You Are Not Measuring What You Think You Are Measuring" by John Wentworth
21 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/9kNxhKWvixtKW5anS/you-are-not-measuring-what-you-think-you-are-measuringEight years ago, I worked as a data scientist ...
"Do bamboos set themselves on fire?" by Malmesbury
20 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/WNpvK67MjREgvB8u8/do-bamboos-set-themselves-on-fireCross-posted from Telescopic Turnip.As we all know, the best place ...
"Survey advice" by Katja Grace
18 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/oyKzz7bvcZMEPaDs6/survey-adviceThings I believe about making surveys, after making some surveys:If you write a questio...
"Toni Kurz and the Insanity of Climbing Mountains" by Gene Smith
18 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/J3wemDGtsy5gzD3xa/toni-kurz-and-the-insanity-of-climbing-mountainsContent warning: deathI've been on a YouTube bi...
"Deliberate Grieving" by Raemon
18 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/gs3vp3ukPbpaEie5L/deliberate-grieving-1This post is hopefully useful on its own, but begins a series ultimately abou...
"Toolbox-thinking and Law-thinking" by Eliezer Yudkowsky
15 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/s/6xgy8XYEisLk3tCjH/p/CPP2uLcaywEokFKQGTl;dr:I've noticed a dichotomy between "thinking in toolboxes" and &qu...
"Local Validity as a Key to Sanity and Civilization" by Eliezer Yudkowsky
15 Sep 2022
Contributed by Lukas
"Humans are not automatically strategic" by Anna Salamon
15 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/PBRWb2Em5SNeWYwwB/humans-are-not-automatically-strategicReply to: A "Failure to Evaluate Return-on-Time" Fal...
"Language models seem to be much better than humans at next-token prediction" by Buck, Fabien and LawrenceC
15 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-nextCrossposted from the AI Alignment Forum....
"Moral strategies at different capability levels" by Richard Ngo
14 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/jDQm7YJxLnMnSNHFu/moral-strategies-at-different-capability-levelsCrossposted from the AI Alignment Forum. May contain ...
"Worlds Where Iterative Design Fails" by John Wentworth
11 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-failsCrossposted from the AI Alignment Forum. May contain more technic...
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
11 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-isDespite a clear need for it, a good sourc...
"Unifying Bargaining Notions (1/2)" by Diffractor
09 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/rYDas2DDGGDRc8gGB/unifying-bargaining-notions-1-2Crossposted from the AI Alignment Forum. May contain more technical j...
'Simulators' by Janus
05 Sep 2022
Contributed by Lukas
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators#fncrt8wagfir9SummaryTL;DR: Self-supervised learning may create AGI or its foundation. Wha...
"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope
08 Aug 2022
Contributed by Lukas
https://www.lesswrong.com/posts/CjFZeDD6iCnNubDoS/humans-provide-an-untapped-wealth-of-evidence-about#fnref7a5ti4623qb Crossposted from the AI Align...
"Changing the world through slack & hobbies" by Steven Byrnes
30 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/DdDt5NXkfuxAnAvGJ/changing-the-world-through-slack-and-hobbies Introduction In EA orthodoxy, if you're really...
"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch
28 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/8oMF8Lv5jiGaQSFvo/boundaries-part-1-a-key-missing-concept-from-utility-theory Crossposted from the AI Alignment Foru...
"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger
24 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/MdZyLnLHuaHrCskjy/itt-passing-and-civility-are-good-charity-is-bad I often object to claims like "charity/steelm...
"What should you change in response to an "emergency"? And AI risk" by Anna Salamon
23 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/mmHctwkKjpvaQdC3c/what-should-you-change-in-response-to-an-emergency-and-ai Related to: Slack gives you the ability ...
"On how various plans miss the hard bits of the alignment challenge" by Nate Soares
17 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment Crossposted from the AI Alignment Forum....
"Humans are very reliable agents" by Alyssa Vance
13 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/28zsuPaJpKAGSX4zq/humans-are-very-reliable-agents Over the last few years, deep-learning-based AI has progressed ext...
"Looking back on my alignment PhD" by TurnTrout
08 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/2GxhAyn9aHqukap2S/looking-back-on-my-alignment-phd The funny thing about long periods of time is that they do, eventu...
"It’s Probably Not Lithium" by Natália Coelho Mendonça
05 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/7iAABhWpcGeP5e6SB/it-s-probably-not-lithium A Chemical Hunger (a), a series by the authors of the blog Slime Mold Tim...
"What Are You Tracking In Your Head?" by John Wentworth
02 Jul 2022
Contributed by Lukas
https://www.lesswrong.com/posts/bhLxWTkRc8GXunFcB/what-are-you-tracking-in-your-head A large chunk - plausibly the majority - of real-world experti...
"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood
29 Jun 2022
Contributed by Lukas
https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-securityBackgroundI have been doing red team, blu...