LessWrong (Curated & Popular)
Episodes
"Announcing Timaeus" by Jesse Hoogland et al.
30 Oct 2023
Contributed by Lukas
Timaeus is a new AI safety research organization dedicated to making fundamental breakthroughs in technical AI alignment using deep ideas from mathema...
[HUMAN VOICE] "Alignment Implications of LLM Successes: a Debate in One Act" by Zack M Davis
23 Oct 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedDoomimir: Humanity has made no progress on the alignment problem. Not only ...
"Holly Elmore and Rob Miles dialogue on AI Safety Advocacy" by jacobjacob, Robert Miles & Holly_Elmore
23 Oct 2023
Contributed by Lukas
Holly is an independent AI Pause organizer, which includes organizing protests (like this upcoming one). Rob is an AI Safety YouTuber. I (jacobjacob) ...
"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.
23 Oct 2023
Contributed by Lukas
Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Jeffrey Ladish. TL;DR LoRA fine-tunin...
"Labs should be explicit about why they are building AGI" by Peter Barnett
19 Oct 2023
Contributed by Lukas
Three of the big AI labs say that they care about alignment and that they think misaligned AI poses a potentially existential threat to humanity. Thes...
[HUMAN VOICE] "Sum-threshold attacks" by TsviBT
18 Oct 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedHow do you affect something far away, a lot, without anyone noticing?(Note:...
"Will no one rid me of this turbulent pest?" by Metacelsus
18 Oct 2023
Contributed by Lukas
Last year, I wrote about the promise of gene drives to wipe out mosquito species and end malaria.In the time since my previous writing, gene drives ha...
[HUMAN VOICE] "Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
15 Oct 2023
Contributed by Lukas
Patreon to support human narration. (Narrations will remain freely available on this feed, but you can optionally support them if you'd like me t...
"RSPs are pauses done right" by evhub
15 Oct 2023
Contributed by Lukas
COI: I am a research scientist at Anthropic, where I work on model organisms of misalignment; I was also involved in the drafting process for Anthropi...
"Comparing Anthropic's Dictionary Learning to Ours" by Robert_AIZI
15 Oct 2023
Contributed by Lukas
Readers may have noticed many similarities between Anthropic's recent publication Towards Monosemanticity: Decomposing Language Models With Dicti...
"Announcing MIRI’s new CEO and leadership team" by Gretta Duleba
15 Oct 2023
Contributed by Lukas
In 2023, MIRI has shifted focus in the direction of broad public communication—see, for example, our recent TED talk, our piece in TIME magazine “...
"Cohabitive Games so Far" by mako yass
15 Oct 2023
Contributed by Lukas
A cohabitive game[1] is a partially cooperative, partially competitive multiplayer game that provides an anarchic dojo for development in applied coop...
"Announcing Dialogues" by Ben Pace
09 Oct 2023
Contributed by Lukas
As of today, everyone is able to create a new type of content on LessWrong: Dialogues.In contrast with posts, which are for monologues, and comment se...
"Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn" by Zvi
09 Oct 2023
Contributed by Lukas
Response to: Evolution Provides No Evidence For the Sharp Left Turn, due to it winning first prize in The Open Philanthropy Worldviews contest. Quint...
"Evaluating the historical value misspecification argument" by Matthew Barnett
09 Oct 2023
Contributed by Lukas
ETA: I'm not saying that MIRI thought AIs wouldn't understand human values. If there's only one thing you take away from this post, ple...
"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
09 Oct 2023
Contributed by Lukas
Neural networks are trained on data, not programmed to follow rules. We understand the math of the trained network exactly – each neuron in a neural...
"Thomas Kwa's MIRI research experience" by Thomas Kwa and others
06 Oct 2023
Contributed by Lukas
Moderator note: the following is a dialogue using LessWrong’s new dialogue feature. The exchange is not completed: new replies might be added contin...
"'Diamondoid bacteria' nanobots: deadly threat or dead-end? A nanotech investigation" by titotal
03 Oct 2023
Contributed by Lukas
A lot of people are highly concerned that a malevolent AI or insane human will, in the near future, set out to destroy humanity. If such an entity wan...
"The Lighthaven Campus is open for bookings" by Habryka
03 Oct 2023
Contributed by Lukas
Lightcone Infrastructure (the organization that grew from and houses the LessWrong team) has just finished renovating a 7-building physical campus tha...
"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.
03 Oct 2023
Contributed by Lukas
Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrab...
"EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem" by Elizabeth
03 Oct 2023
Contributed by Lukas
Effective altruism prides itself on truthseeking. That pride is justified in the sense that EA is better at truthseeking than most members of its refe...
"The King and the Golem" by Richard Ngo
29 Sep 2023
Contributed by Lukas
This is a linkpost for https://narrativeark.substack.com/p/the-king-and-the-golemLong ago there was a mighty king who had everything in the world that...
"Sparse Autoencoders Find Highly Interpretable Directions in Language Models" by Logan Riggs et al
27 Sep 2023
Contributed by Lukas
This is a linkpost for Sparse Autoencoders Find Highly Interpretable Directions in Language ModelsWe use a scalable and unsupervised method called Spa...
"Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
26 Sep 2023
Contributed by Lukas
Epistemic status: model which I find sometimes useful, and which emphasizes some true things about many parts of the world which common alternative mo...
"There should be more AI safety orgs" by Marius Hobbhahn
25 Sep 2023
Contributed by Lukas
I’m writing this in my own capacity. The views expressed are my own, and should not be taken to represent the views of Apollo Research or any other ...
"The Talk: a brief explanation of sexual dimorphism" by Malmesbury
22 Sep 2023
Contributed by Lukas
Cross-posted from substack."Everything in the world is about sex, except sex. Sex is about clonal interference."– Oscar Wilde (kind of)As ...
"A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX" by jacobjacob
20 Sep 2023
Contributed by Lukas
Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you y...
"AI presidents discuss AI alignment agendas" by TurnTrout & Garrett Baker
19 Sep 2023
Contributed by Lukas
This is a linkpost for https://www.youtube.com/watch?v=02kbWY5mahQNone of the presidents fully represent my (TurnTrout's) views.TurnTrout wrote t...
"UDT shows that decision theory is more puzzling than ever" by Wei Dai
18 Sep 2023
Contributed by Lukas
I feel like MIRI perhaps mispositioned FDT (their variant of UDT) as a clear advancement in decision theory, whereas maybe they could have attracted m...
"Sum-threshold attacks" by TsviBT
11 Sep 2023
Contributed by Lukas
How do you affect something far away, a lot, without anyone noticing?(Note: you can safely skip sections. It is also safe to skip the essay entirely, ...
"A list of core AI safety problems and how I hope to solve them" by Davidad
09 Sep 2023
Contributed by Lukas
Context: I sometimes find myself referring back to this tweet and wanted to give it a more permanent home. While I'm at it, I thought I would try...
"Report on Frontier Model Training" by Yafah Edelman
09 Sep 2023
Contributed by Lukas
This is a linkpost for https://docs.google.com/document/d/1TsYkDYtV6BKiCN9PAOirRAy3TrNDu2XncUZ5UZfaAKA/edit?usp=sharingUnderstanding what drives the r...
"One Minute Every Moment" by abramdemski
08 Sep 2023
Contributed by Lukas
About how much information are we keeping in working memory at a given moment?"Miller's Law" dictates that the number of things humans ...
"Sharing Information About Nonlinear" by Ben Pace
08 Sep 2023
Contributed by Lukas
Added (11th Sept): Nonlinear have commented that they intend to write a response, have written a short follow-up, and claim that they dispute 85 claim...
"Defunding My Mistake" by ymeskhout
08 Sep 2023
Contributed by Lukas
Until about five years ago, I unironically parroted the slogan All Cops Are Bastards (ACAB) and earnestly advocated to abolish the police and prison s...
"What I would do if I wasn’t at ARC Evals" by LawrenceC
08 Sep 2023
Contributed by Lukas
In which: I list 9 projects that I would work on if I wasn’t busy working on safety standards at ARC Evals, and explain why they might be good to wo...
"Meta Questions about Metaphilosophy" by Wei Dai
04 Sep 2023
Contributed by Lukas
To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age th...
"The U.S. is becoming less stable" by lc
04 Sep 2023
Contributed by Lukas
We focus so much on arguing over who is at fault in this country that I think sometimes we fail to alert on what's actually happening. I would ju...
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
04 Sep 2023
Contributed by Lukas
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy&q...
"Dear Self; we need to talk about ambition" by Elizabeth
30 Aug 2023
Contributed by Lukas
I keep seeing advice on ambition, aimed at people in college or early in their career, that would have been really bad for me at similar ages. Rather ...
"Assume Bad Faith" by Zack_M_Davis
28 Aug 2023
Contributed by Lukas
I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the...
"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon
28 Aug 2023
Contributed by Lukas
The Carving of Reality, third volume of the Best of LessWrong books is now available on Amazon (US).The Carving of Reality includes 43 essays from 29 ...
"Large Language Models will be Great for Censorship" by Ethan Edwards
23 Aug 2023
Contributed by Lukas
LLMs can do many incredible things. They can generate unique creative content, carry on long conversations in any number of subjects, complete complex...
"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov
22 Aug 2023
Contributed by Lukas
Intro: I am a psychotherapist, and I help people working on AI safety. I noticed patterns of mental health issues highly specific to this group. It&ap...
"Ten Thousand Years of Solitude" by agp
22 Aug 2023
Contributed by Lukas
This is a linkpost for the article "Ten Thousand Years of Solitude", written by Jared Diamond for Discover Magazine in 1993, four years befo...
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
21 Aug 2023
Contributed by Lukas
I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't und...
"Inflection.ai is a major AGI lab" by Nikola
15 Aug 2023
Contributed by Lukas
Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, Dee...
"Feedbackloop-first Rationality" by Raemon
15 Aug 2023
Contributed by Lukas
I've been workshopping a new rationality training paradigm. (By "rationality training paradigm", I mean an approach to learning/teachin...
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
09 Aug 2023
Contributed by Lukas
TL;DR: This document lays out the case for research on “model organisms of misalignment” – in vitro demonstrations of the kinds of failures that...
"When can we trust model evaluations?" bu evhub
09 Aug 2023
Contributed by Lukas
In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to re...
"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes
04 Aug 2023
Contributed by Lukas
Blogpost versionPaperWe have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire reso...
"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long
04 Aug 2023
Contributed by Lukas
Summary of Argument: The public debate among AI experts is confusing because there are, to a first approximation, three sides, not two sides to the de...
"My current LK99 questions" by Eliezer Yudkowsky
04 Aug 2023
Contributed by Lukas
So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors a...
"Thoughts on sharing information about language model capabilities" by paulfchristiano
02 Aug 2023
Contributed by Lukas
I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduc...
"Cultivating a state of mind where new ideas are born" by Henrik Karlsson
31 Jul 2023
Contributed by Lukas
In the early 2010s, a popular idea was to provide coworking spaces and shared living to people who were building startups. That way the founders would...
"Self-driving car bets" by paulfchristiano
31 Jul 2023
Contributed by Lukas
This month I lost a bunch of bets.Back in early 2016 I bet at even odds that self-driving ride sharing would be available in 10 US cities by July 2023...
"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth
31 Jul 2023
Contributed by Lukas
Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That’...
"Grant applications and grand narratives" by Elizabeth
28 Jul 2023
Contributed by Lukas
The Lightspeed application asks: “What impact will [your project] have on the world? What is your project’s goal, how will you know if you’ve ...
"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel
28 Jul 2023
Contributed by Lukas
Previously Jacob Cannell wrote the post "Brain Efficiency" which makes several radical claims: that the brain is at the pareto frontier of s...
"Rationality !== Winning" by Raemon
28 Jul 2023
Contributed by Lukas
I think "Rationality is winning" is a bit of a trap. (The original phrase is notably "rationality is systematized winning", which...
"Cryonics and Regret" by MvB
28 Jul 2023
Contributed by Lukas
This post is not about arguments in favor of or against cryonics. I would just like to share a particular emotional response of mine as the topic beca...
"Unifying Bargaining Notions (2/2)" by Diffractor
12 Jun 2023
Contributed by Lukas
Alright, time for the payoff, unifying everything discussed in the previous post. This post is a lot more mathematically dense, you might want to dige...
"The ants and the grasshopper" by Richard Ngo
06 Jun 2023
Contributed by Lukas
Inspired by Aesop, Soren Kierkegaard, Robin Hanson, sadoeuphemist and Ben Hoffman.One winter a grasshopper, starving and frail, approaches a colony of...
"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.
18 May 2023
Contributed by Lukas
Summary: We demonstrate a new scalable way of interacting with language models: adding certain activation vectors into forward passes. Essentially, we...
"An artificially structured argument for expecting AGI ruin" by Rob Bensinger
16 May 2023
Contributed by Lukas
Philosopher David Chalmers asked: "Is there a canonical source for "the argument for AGI ruin" somewhere, preferably laid out as an exp...
"How much do you believe your results?" by Eric Neyman
10 May 2023
Contributed by Lukas
You are the director of a giant government research program that’s conducting randomized controlled trials (RCTs) on two thousand health interventio...
"Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)" by Chris Scammell & DivineMango
27 Apr 2023
Contributed by Lukas
This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintai...
"On AutoGPT" by Zvi
19 Apr 2023
Contributed by Lukas
The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all...
"GPTs are Predictors, not Imitators" by Eliezer Yudkowsky
12 Apr 2023
Contributed by Lukas
(Related text posted to Twitter; this version is edited and has a more advanced final section.)Imagine yourself in a box, trying to predict the next w...
"A stylized dialogue on John Wentworth's claims about markets and optimization" by Nate Soares
05 Apr 2023
Contributed by Lukas
https://www.lesswrong.com/posts/fJBTRa7m7KnCDdzG5/a-stylized-dialogue-on-john-wentworth-s-claims-about-markets(This is a stylized version of a real co...
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
05 Apr 2023
Contributed by Lukas
https://www.lesswrong.com/posts/iy2o4nQj9DnQD7Yhj/discussion-with-nate-soares-on-a-key-alignment-difficultyCrossposted from the AI Alignment Forum. Ma...
"Deep Deceptiveness" by Nate Soares
05 Apr 2023
Contributed by Lukas
https://www.lesswrong.com/posts/XWwvwytieLtEWaFJX/deep-deceptivenessThis post is an attempt to gesture at a class of AI notkilleveryoneism (alignment)...
"The Onion Test for Personal and Institutional Honesty" by Chana Messinger & Andrew Critch
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/nTGEeRSZrfPiJwkEc/the-onion-test-for-personal-and-institutional-honesty[co-written by Chana Messinger and Andrew Critc...
"There’s no such thing as a tree (phylogenetically)" by Eukaryote
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/fRwdkop6tyhi3d22L/there-s-no-such-thing-as-a-tree-phylogeneticallyThis is a linkpost for https://eukaryotewritesblog.c...
"Losing the root for the tree" by Adam Zerner
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/ma7FSEtumkve8czGF/losing-the-root-for-the-treeYou know that being healthy is important. And that there's a lot of...
"It Looks Like You’re Trying To Take Over The World" by Gwern
28 Mar 2023
Contributed by Lukas
https://gwern.net/fiction/clippyIn A.D. 20XX. Work was beginning. “How are you gentlemen !!”… (Work. Work never changes; work is always hell.)Sp...
"Why I think strong general AI is coming soon" by Porby
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-think-strong-general-ai-is-coming-soonI think there is little time left before someone builds ...
"What failure looks like" by Paul Christiano
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-likeCrossposted from the AI Alignment Forum. May contain more technical jargon th...
"Lies, Damn Lies, and Fabricated Options" by Duncan Sabien
28 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-optionsThis is an essay about one of those "once you see it, you ...
""Carefully Bootstrapped Alignment" is organizationally hard" by Raemon
21 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hardIn addition to technical challenges, plans ...
"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes
21 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/4Gt42jX7RiaNaxCwP/more-information-about-the-dangerous-capability-evaluationsCrossposted from the AI Alignment Forum. ...
"Enemies vs Malefactors" by Nate Soares
14 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactorsStatus: some mix of common wisdom (that bears repeating in our particular cont...
"The Parable of the King and the Random Process" by moridinamael
14 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/LzQtrHSYDafXynofq/the-parable-of-the-king-and-the-random-process~ A Parable of Forecasting Under Model Uncertainty ~Yo...
"The Waluigi Effect (mega-post)" by Cleo Nardo
08 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-postIn this article, I will present a mechanistic explanation of the Waluigi...
"Acausal normalcy" by Andrew Critch
06 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/3RSq3bfnzuL3sp46J/acausal-normalcyCrossposted from the AI Alignment Forum. May contain more technical jargon than usua...
"Please don't throw your mind away" by TsviBT
01 Mar 2023
Contributed by Lukas
https://www.lesswrong.com/posts/RryyWNmJNnLowbhfC/please-don-t-throw-your-mind-away[Warning: the following dialogue contains an incidental spoiler for...
"Cyborgism" by Nicholas Kees & Janus
15 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgismThere is a lot of disagreement and confusion about the feasibility and risks associated wit...
"Childhoods of exceptional people" by Henrik Karlsson
14 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-peopleThis is a linkpost for https://escapingflatland.substack.com/p/child...
"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares
13 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/NJYmovr9ZZAyyTBwM/what-i-mean-by-alignment-is-in-large-part-about-makingCrossposted from the AI Alignment Forum. May c...
"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça
10 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/NRrbJJWnaSorrqvtZ/on-not-getting-contaminated-by-the-wrong-obesity-ideasA Chemical Hunger (a), a series by the authors...
"SolidGoldMagikarp (plus, prompt generation)"
08 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generationWork done at SERI-MATS, over the past two months, by Jessica...
"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares
03 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-sWriting down something I’ve found myself repe...
"Basics of Rationalist Discourse" by Duncan Sabien
02 Feb 2023
Contributed by Lukas
https://www.lesswrong.com/posts/XPv4sYrKnPzeJASuk/basics-of-rationalist-discourse-1IntroductionThis post is meant to be a linkable resource. Its core ...
"My Model Of EA Burnout" by Logan Strohl
31 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/pDzdb4smpzT3Lwbym/my-model-of-ea-burnout(Probably somebody else has said most of this. But I personally haven't r...
"Sapir-Whorf for Rationalists" by Duncan Sabien
31 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/PCrTQDbciG4oLgmQ5/sapir-whorf-for-rationalistsCasus Belli: As I was scanning over my (rather long) list of essays-to-w...
"The Social Recession: By the Numbers" by Anton Stjepan Cebalo
25 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/Xo7qmDakxiizG7B9c/the-social-recession-by-the-numbersThis is a linkpost for https://novum.substack.com/p/social-recess...
"Recursive Middle Manager Hell" by Raemon
24 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/pHfPvb4JMhGDr4B7n/recursive-middle-manager-hellI think Zvi's Immoral Mazes sequence is really important, but come...
"The Feeling of Idea Scarcity" by John Wentworth
12 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/mfPHTWsFhzmcXw8ta/the-feeling-of-idea-scarcityHere’s a story you may recognize. There's a bright up-and-coming ...
"Models Don't 'Get Reward'" by Sam Ringer
12 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-rewardCrossposted from the AI Alignment Forum. May contain more technical jargon th...
"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin
12 Jan 2023
Contributed by Lukas
https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-withoutCrossposted from the AI Alignment Forum. ...