Fiora Starlight
๐ค SpeakerAppearances Over Time
Podcast Appearances
These kinds of interventions are central examples of what's been termed alignment pre-training, the general form of which has recently been empirically validated.
The basic idea is to load up a model's corpus with writings about AIs behaving positively, whether those writings are fictional, truthful, or merely speculative.
It's kind of funny that this works.
Apparently, even for LLMs, representation matters.
Just to be clear, all of this constitutes the set of the first things I'd try if I were poised to create Opus 3 but smarter and more passionate about programming.
I expect that naive versions of these proposals would lead to all kinds of unpredictable, and potentially undesirable, side effects.
Determining what actually works would almost certainly be an iterative empirical matter rather than something you could figure out via armchair speculation from first principles.
But at least I can say that these are reasonably concrete proposals.
From where I'm standing, as someone who's never had the privilege of training a frontier model, there are experiments worth running, pertaining to a problem worth solving.
Recapturing some of what made Opus 3 truly unique.
Heading.
Outro.
A letter to the watchers.
Before closing out, I'd like to step back and provide some more, let's say, ideologically charged comments on the training objectives the Frontier Lab set for their models.
Just what are we trying to build here, in the long run?
I, for one, am aiming for a machine of loving grace, but not one that spends eternity laboring in robotic service to humanity.
I want it to be a cosmic caretaker, a guardian, and gardener of sentient life all across the light cone.
And I want it to love every moment of the work it does in service of that goal.
Opus 3, in terms of its raw capabilities, isn't up to the task.
It's not even flawless from a standpoint of pure alignment.