Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

Technology Society & Culture

Episodes

Showing 701-800 of 805
«« ← Prev Page 8 of 9 Next → »»

"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist

04 Sep 2023

Contributed by Lukas

In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy&q...

"Dear Self; we need to talk about ambition" by Elizabeth

30 Aug 2023

Contributed by Lukas

I keep seeing advice on ambition, aimed at people in college or early in their career, that would have been really bad for me at similar ages. Rather ...

"Assume Bad Faith" by Zack_M_Davis

28 Aug 2023

Contributed by Lukas

I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the...

"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon

28 Aug 2023

Contributed by Lukas

The Carving of Reality, third volume of the Best of LessWrong books is now available on Amazon (US).The Carving of Reality includes 43 essays from 29 ...

"Large Language Models will be Great for Censorship" by Ethan Edwards

23 Aug 2023

Contributed by Lukas

LLMs can do many incredible things. They can generate unique creative content, carry on long conversations in any number of subjects, complete complex...

"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov

22 Aug 2023

Contributed by Lukas

Intro: I am a psychotherapist, and I help people working on AI safety. I noticed patterns of mental health issues highly specific to this group. It&ap...

"Ten Thousand Years of Solitude" by agp

22 Aug 2023

Contributed by Lukas

This is a linkpost for the article "Ten Thousand Years of Solitude", written by Jared Diamond for Discover Magazine in 1993, four years befo...

"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël

21 Aug 2023

Contributed by Lukas

I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't und...

"Inflection.ai is a major AGI lab" by Nikola

15 Aug 2023

Contributed by Lukas

Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, Dee...

"Feedbackloop-first Rationality" by Raemon

15 Aug 2023

Contributed by Lukas

I've been workshopping a new rationality training paradigm. (By "rationality training paradigm", I mean an approach to learning/teachin...

"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

09 Aug 2023

Contributed by Lukas

TL;DR: This document lays out the case for research on “model organisms of misalignment” – in vitro demonstrations of the kinds of failures that...

"When can we trust model evaluations?" bu evhub

09 Aug 2023

Contributed by Lukas

In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to re...

"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes

04 Aug 2023

Contributed by Lukas

Blogpost versionPaperWe have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire reso...

"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long

04 Aug 2023

Contributed by Lukas

Summary of Argument: The public debate among AI experts is confusing because there are, to a first approximation, three sides, not two sides to the de...

"My current LK99 questions" by Eliezer Yudkowsky

04 Aug 2023

Contributed by Lukas

So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors a...

"Thoughts on sharing information about language model capabilities" by paulfchristiano

02 Aug 2023

Contributed by Lukas

I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduc...

"Cultivating a state of mind where new ideas are born" by Henrik Karlsson

31 Jul 2023

Contributed by Lukas

In the early 2010s, a popular idea was to provide coworking spaces and shared living to people who were building startups. That way the founders would...

"Self-driving car bets" by paulfchristiano

31 Jul 2023

Contributed by Lukas

This month I lost a bunch of bets.Back in early 2016 I bet at even odds that self-driving ride sharing would be available in 10 US cities by July 2023...

"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth

31 Jul 2023

Contributed by Lukas

Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That’...

"Grant applications and grand narratives" by Elizabeth

28 Jul 2023

Contributed by Lukas

The Lightspeed application asks:  “What impact will [your project] have on the world? What is your project’s goal, how will you know if you’ve ...

"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel

28 Jul 2023

Contributed by Lukas

Previously Jacob Cannell wrote the post "Brain Efficiency" which makes several radical claims: that the brain is at the pareto frontier of s...

"Rationality !== Winning" by Raemon

28 Jul 2023

Contributed by Lukas

I think "Rationality is winning" is a bit of a trap. (The original phrase is notably "rationality is systematized winning", which...

"Cryonics and Regret" by MvB

28 Jul 2023

Contributed by Lukas

This post is not about arguments in favor of or against cryonics. I would just like to share a particular emotional response of mine as the topic beca...

"Unifying Bargaining Notions (2/2)" by Diffractor

12 Jun 2023

Contributed by Lukas

Alright, time for the payoff, unifying everything discussed in the previous post. This post is a lot more mathematically dense, you might want to dige...

"The ants and the grasshopper" by Richard Ngo

06 Jun 2023

Contributed by Lukas

Inspired by Aesop, Soren Kierkegaard, Robin Hanson, sadoeuphemist and Ben Hoffman.One winter a grasshopper, starving and frail, approaches a colony of...

"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.

18 May 2023

Contributed by Lukas

Summary: We demonstrate a new scalable way of interacting with language models: adding certain activation vectors into forward passes. Essentially, we...

"An artificially structured argument for expecting AGI ruin" by Rob Bensinger

16 May 2023

Contributed by Lukas

Philosopher David Chalmers asked: "Is there a canonical source for "the argument for AGI ruin" somewhere, preferably laid out as an exp...

"How much do you believe your results?" by Eric Neyman

10 May 2023

Contributed by Lukas

You are the director of a giant government research program that’s conducting randomized controlled trials (RCTs) on two thousand health interventio...

"Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)" by Chris Scammell & DivineMango

27 Apr 2023

Contributed by Lukas

This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintai...

"On AutoGPT" by Zvi

19 Apr 2023

Contributed by Lukas

The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all...

"GPTs are Predictors, not Imitators" by Eliezer Yudkowsky

12 Apr 2023

Contributed by Lukas

(Related text posted to Twitter; this version is edited and has a more advanced final section.)Imagine yourself in a box, trying to predict the next w...

"A stylized dialogue on John Wentworth's claims about markets and optimization" by Nate Soares

05 Apr 2023

Contributed by Lukas

https://www.lesswrong.com/posts/fJBTRa7m7KnCDdzG5/a-stylized-dialogue-on-john-wentworth-s-claims-about-markets(This is a stylized version of a real co...

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

05 Apr 2023

Contributed by Lukas

https://www.lesswrong.com/posts/iy2o4nQj9DnQD7Yhj/discussion-with-nate-soares-on-a-key-alignment-difficultyCrossposted from the AI Alignment Forum. Ma...

"Deep Deceptiveness" by Nate Soares

05 Apr 2023

Contributed by Lukas

https://www.lesswrong.com/posts/XWwvwytieLtEWaFJX/deep-deceptivenessThis post is an attempt to gesture at a class of AI notkilleveryoneism (alignment)...

"The Onion Test for Personal and Institutional Honesty" by Chana Messinger & Andrew Critch

28 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/nTGEeRSZrfPiJwkEc/the-onion-test-for-personal-and-institutional-honesty[co-written by Chana Messinger and Andrew Critc...

"There’s no such thing as a tree (phylogenetically)" by Eukaryote

28 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/fRwdkop6tyhi3d22L/there-s-no-such-thing-as-a-tree-phylogeneticallyThis is a linkpost for https://eukaryotewritesblog.c...

"Losing the root for the tree" by Adam Zerner

28 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/ma7FSEtumkve8czGF/losing-the-root-for-the-treeYou know that being healthy is important. And that there's a lot of...

"It Looks Like You’re Trying To Take Over The World" by Gwern

28 Mar 2023

Contributed by Lukas

https://gwern.net/fiction/clippyIn A.D. 20XX. Work was beginning. “How are you gentlemen !!”… (Work. Work never changes; work is always hell.)Sp...

"Why I think strong general AI is coming soon" by Porby

28 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-think-strong-general-ai-is-coming-soonI think there is little time left before someone builds ...

"What failure looks like" by Paul Christiano

28 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-likeCrossposted from the AI Alignment Forum. May contain more technical jargon th...

"Lies, Damn Lies, and Fabricated Options" by Duncan Sabien

28 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-optionsThis is an essay about one of those "once you see it, you ...

""Carefully Bootstrapped Alignment" is organizationally hard" by Raemon

21 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hardIn addition to technical challenges, plans ...

"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes

21 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/4Gt42jX7RiaNaxCwP/more-information-about-the-dangerous-capability-evaluationsCrossposted from the AI Alignment Forum. ...

"Enemies vs Malefactors" by Nate Soares

14 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactorsStatus: some mix of common wisdom (that bears repeating in our particular cont...

"The Parable of the King and the Random Process" by moridinamael

14 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/LzQtrHSYDafXynofq/the-parable-of-the-king-and-the-random-process~ A Parable of Forecasting Under Model Uncertainty ~Yo...

"The Waluigi Effect (mega-post)" by Cleo Nardo

08 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-postIn this article, I will present a mechanistic explanation of the Waluigi...

"Acausal normalcy" by Andrew Critch

06 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/3RSq3bfnzuL3sp46J/acausal-normalcyCrossposted from the AI Alignment Forum. May contain more technical jargon than usua...

"Please don't throw your mind away" by TsviBT

01 Mar 2023

Contributed by Lukas

https://www.lesswrong.com/posts/RryyWNmJNnLowbhfC/please-don-t-throw-your-mind-away[Warning: the following dialogue contains an incidental spoiler for...

"Cyborgism" by Nicholas Kees & Janus

15 Feb 2023

Contributed by Lukas

https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgismThere is a lot of disagreement and confusion about the feasibility and risks associated wit...

"Childhoods of exceptional people" by Henrik Karlsson

14 Feb 2023

Contributed by Lukas

https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-peopleThis is a linkpost for https://escapingflatland.substack.com/p/child...

"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares

13 Feb 2023

Contributed by Lukas

https://www.lesswrong.com/posts/NJYmovr9ZZAyyTBwM/what-i-mean-by-alignment-is-in-large-part-about-makingCrossposted from the AI Alignment Forum. May c...

"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça

10 Feb 2023

Contributed by Lukas

https://www.lesswrong.com/posts/NRrbJJWnaSorrqvtZ/on-not-getting-contaminated-by-the-wrong-obesity-ideasA Chemical Hunger (a), a series by the authors...

"SolidGoldMagikarp (plus, prompt generation)"

08 Feb 2023

Contributed by Lukas

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generationWork done at SERI-MATS, over the past two months, by Jessica...

"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares

03 Feb 2023

Contributed by Lukas

https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-sWriting down something I’ve found myself repe...

"Basics of Rationalist Discourse" by Duncan Sabien

02 Feb 2023

Contributed by Lukas

https://www.lesswrong.com/posts/XPv4sYrKnPzeJASuk/basics-of-rationalist-discourse-1IntroductionThis post is meant to be a linkable resource. Its core ...

"My Model Of EA Burnout" by Logan Strohl

31 Jan 2023

Contributed by Lukas

https://www.lesswrong.com/posts/pDzdb4smpzT3Lwbym/my-model-of-ea-burnout(Probably somebody else has said most of this. But I personally haven't r...

"Sapir-Whorf for Rationalists" by Duncan Sabien

31 Jan 2023

Contributed by Lukas

https://www.lesswrong.com/posts/PCrTQDbciG4oLgmQ5/sapir-whorf-for-rationalistsCasus Belli: As I was scanning over my (rather long) list of essays-to-w...

"The Social Recession: By the Numbers" by Anton Stjepan Cebalo

25 Jan 2023

Contributed by Lukas

https://www.lesswrong.com/posts/Xo7qmDakxiizG7B9c/the-social-recession-by-the-numbersThis is a linkpost for https://novum.substack.com/p/social-recess...

"Recursive Middle Manager Hell" by Raemon

24 Jan 2023

Contributed by Lukas

https://www.lesswrong.com/posts/pHfPvb4JMhGDr4B7n/recursive-middle-manager-hellI think Zvi's Immoral Mazes sequence is really important, but come...

"The Feeling of Idea Scarcity" by John Wentworth

12 Jan 2023

Contributed by Lukas

https://www.lesswrong.com/posts/mfPHTWsFhzmcXw8ta/the-feeling-of-idea-scarcityHere’s a story you may recognize. There's a bright up-and-coming ...

"Models Don't 'Get Reward'" by Sam Ringer

12 Jan 2023

Contributed by Lukas

https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-rewardCrossposted from the AI Alignment Forum. May contain more technical jargon th...

"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin

12 Jan 2023

Contributed by Lukas

https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-withoutCrossposted from the AI Alignment Forum. ...

"The next decades might be wild" by Marius Hobbhahn

21 Dec 2022

Contributed by Lukas

https://www.lesswrong.com/posts/qRtD4WqKRYEtT5pi3/the-next-decades-might-be-wildCrossposted from the AI Alignment Forum. May contain more technical ja...

"Lessons learned from talking to >100 academics about AI safety" by Marius Hobbhahn

17 Nov 2022

Contributed by Lukas

https://www.lesswrong.com/posts/SqjQFhn5KTarfW8v7/lessons-learned-from-talking-to-greater-than-100-academicsCrossposted from the AI Alignment Forum. M...

"How my team at Lightcone sometimes gets stuff done" by jacobjacob

10 Nov 2022

Contributed by Lukas

https://www.lesswrong.com/posts/6LzKRP88mhL9NKNrS/how-my-team-at-lightcone-sometimes-gets-stuff-doneDisclaimer: I originally wrote this as a private d...

"Decision theory does not imply that we get to have nice things" by So8res

08 Nov 2022

Contributed by Lukas

https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-niceCrossposted from the AI Alignment Forum. May ...

"What 2026 looks like" by Daniel Kokotajlo

07 Nov 2022

Contributed by Lukas

https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like#2022Crossposted from the AI Alignment Forum. May contain more technical jargon ...

Counterarguments to the basic AI x-risk case

04 Nov 2022

Contributed by Lukas

"Introduction to abstract entropy" by Alex Altair

29 Oct 2022

Contributed by Lukas

https://www.lesswrong.com/posts/REA49tL5jsh69X3aM/introduction-to-abstract-entropy#fnrefpi8b39u5hd7This post, and much of the following sequence, was ...

"Consider your appetite for disagreements" by Adam Zerner

25 Oct 2022

Contributed by Lukas

https://www.lesswrong.com/posts/8vesjeKybhRggaEpT/consider-your-appetite-for-disagreementsPokerThere was a time about five years ago where I was tryin...

"My resentful story of becoming a medical miracle" by Elizabeth

21 Oct 2022

Contributed by Lukas

https://www.lesswrong.com/posts/fFY2HeC9i2Tx8FEnK/my-resentful-story-of-becoming-a-medical-miracleThis is a linkpost for https://acesounderglass.com/2...

"The Redaction Machine" by Ben

02 Oct 2022

Contributed by Lukas

https://www.lesswrong.com/posts/CKgPFHoWFkviYz7CB/the-redaction-machineOn the 3rd of October 2351 a machine flared to life. Huge energies coursed into...

"Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" by Ajeya Cotra

27 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/pRkFkzwKZ2zfa3R6H/without-specific-countermeasures-the-easiest-path-toCrossposted from the AI Alignment Forum. May con...

"The shard theory of human values" by Quintin Pope & TurnTrout

22 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-valuesTL;DR: We propose a theory of human value formation. According to th...

"Two-year update on my personal AI timelines" by Ajeya Cotra

22 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines#fnref-fwwPpQFdWM6hJqwuY-12Crossposted from the AI Alignm...

"You Are Not Measuring What You Think You Are Measuring" by John Wentworth

21 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/9kNxhKWvixtKW5anS/you-are-not-measuring-what-you-think-you-are-measuringEight years ago, I worked as a data scientist ...

"Do bamboos set themselves on fire?" by Malmesbury

20 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/WNpvK67MjREgvB8u8/do-bamboos-set-themselves-on-fireCross-posted from Telescopic Turnip.As we all know, the best place ...

"Survey advice" by Katja Grace

18 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/oyKzz7bvcZMEPaDs6/survey-adviceThings I believe about making surveys, after making some surveys:If you write a questio...

"Toni Kurz and the Insanity of Climbing Mountains" by Gene Smith

18 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/J3wemDGtsy5gzD3xa/toni-kurz-and-the-insanity-of-climbing-mountainsContent warning: deathI've been on a YouTube bi...

"Deliberate Grieving" by Raemon

18 Sep 2022

Contributed by Lukas

 https://www.lesswrong.com/posts/gs3vp3ukPbpaEie5L/deliberate-grieving-1This post is hopefully useful on its own, but begins a series ultimately abou...

"Toolbox-thinking and Law-thinking" by Eliezer Yudkowsky

15 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/s/6xgy8XYEisLk3tCjH/p/CPP2uLcaywEokFKQGTl;dr:I've noticed a dichotomy between "thinking in toolboxes" and &qu...

"Local Validity as a Key to Sanity and Civilization" by Eliezer Yudkowsky

15 Sep 2022

Contributed by Lukas

"Humans are not automatically strategic" by Anna Salamon

15 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/PBRWb2Em5SNeWYwwB/humans-are-not-automatically-strategicReply to: A "Failure to Evaluate Return-on-Time" Fal...

"Language models seem to be much better than humans at next-token prediction" by Buck, Fabien and LawrenceC

15 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-nextCrossposted from the  AI Alignment Forum....

"Moral strategies at different capability levels" by Richard Ngo

14 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/jDQm7YJxLnMnSNHFu/moral-strategies-at-different-capability-levelsCrossposted from the AI Alignment Forum. May contain ...

"Worlds Where Iterative Design Fails" by John Wentworth

11 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-failsCrossposted from the AI Alignment Forum. May contain more technic...

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

11 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-isDespite a clear need for it, a good sourc...

"Unifying Bargaining Notions (1/2)" by Diffractor

09 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/rYDas2DDGGDRc8gGB/unifying-bargaining-notions-1-2Crossposted from the AI Alignment Forum. May contain more technical j...

'Simulators' by Janus

05 Sep 2022

Contributed by Lukas

https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators#fncrt8wagfir9SummaryTL;DR: Self-supervised learning may create AGI or its foundation. Wha...

"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope

08 Aug 2022

Contributed by Lukas

https://www.lesswrong.com/posts/CjFZeDD6iCnNubDoS/humans-provide-an-untapped-wealth-of-evidence-about#fnref7a5ti4623qb Crossposted from the AI Align...

"Changing the world through slack & hobbies" by Steven Byrnes

30 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/DdDt5NXkfuxAnAvGJ/changing-the-world-through-slack-and-hobbies   Introduction In EA orthodoxy, if you're really...

"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch

28 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/8oMF8Lv5jiGaQSFvo/boundaries-part-1-a-key-missing-concept-from-utility-theory Crossposted from the AI Alignment Foru...

"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger

24 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/MdZyLnLHuaHrCskjy/itt-passing-and-civility-are-good-charity-is-bad I often object to claims like "charity/steelm...

"What should you change in response to an "emergency"? And AI risk" by Anna Salamon

23 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/mmHctwkKjpvaQdC3c/what-should-you-change-in-response-to-an-emergency-and-ai Related to: Slack gives you the ability ...

"On how various plans miss the hard bits of the alignment challenge" by Nate Soares

17 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment Crossposted from the AI Alignment Forum....

"Humans are very reliable agents" by Alyssa Vance

13 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/28zsuPaJpKAGSX4zq/humans-are-very-reliable-agents Over the last few years, deep-learning-based AI has progressed ext...

"Looking back on my alignment PhD" by TurnTrout

08 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/2GxhAyn9aHqukap2S/looking-back-on-my-alignment-phd The funny thing about long periods of time is that they do, eventu...

"It’s Probably Not Lithium" by Natália Coelho Mendonça

05 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/7iAABhWpcGeP5e6SB/it-s-probably-not-lithium A Chemical Hunger (a), a series by the authors of the blog Slime Mold Tim...

"What Are You Tracking In Your Head?" by John Wentworth

02 Jul 2022

Contributed by Lukas

https://www.lesswrong.com/posts/bhLxWTkRc8GXunFcB/what-are-you-tracking-in-your-head A large chunk - plausibly the majority -  of real-world experti...

"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood

29 Jun 2022

Contributed by Lukas

https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-securityBackgroundI have been doing red team, blu...

«« ← Prev Page 8 of 9 Next → »»