Jeremiah
๐ค SpeakerAppearances Over Time
Podcast Appearances
Washingtonians party to network.
Here in San Francisco, they party because Claude 4.5 Opus has saturated Vending Bench.
And the newest AI agency benchmark is Party Bench, where an AI is asked to throw a house party and grade it on its performance.
You weren't invited to Claude 4.5 Opus's party.
Claude 4.5 Opus invited all of the coolest people in town while gracefully avoiding the failure mode of including someone like you.
You weren't invited to Sonnet 4.5's party either, or Haiku 4.5's.
You were invited by an AI called Haiku 3.8 Open Mini Non-Thinking, which you'd never heard of before.
Who was even spending the money to benchmark Haiku 3.8 Open Mini Non-Thinking?
You suspect it was one of their competitors trying to make their own models look good in comparison.
If anyone asks, you think it deserves a medium score.
There's alcohol, but it's bottles of rubbing alcohol with not-for-drinking written all over them.
There's music, but it's the Star Spangled Banner again and again on repeat.
You're not sure whether the copies of If Anyone Builds It, Everyone Dies strewn around the room are some kind of subversive decorative theme or just came along with the house.
At least there are people.
Lots of people, actually.
You've never seen so many people at one of these before.
It takes only a few seconds to spot someone you know.
Hi, Caitlin, you say.
Can't believe so many people made it to an AI-generated event on a Tuesday night.
Yeah, usually I'm working late, but that was the bad old days before Claude Code.