Jeremiah
๐ค SpeakerAppearances Over Time
Podcast Appearances
Every city parties for its own reasons.
New Yorkers party to flaunt their wealth.
Angelinos party to flaunt their beauty.
Washingtonians party to network.
Here in San Francisco, they party because Claude 4.5 Opus has saturated Vending Bench.
And the newest AI agency benchmark is Party Bench, where an AI is asked to throw a house party and grade it on its performance.
You weren't invited to Claude 4.5 Opus's party.
Claude 4.5 Opus invited all of the coolest people in town while gracefully avoiding the failure mode of including someone like you.
You weren't invited to Sonnet 4.5's party either, or Haiku 4.5's.
You were invited by an AI called Haiku 3.8 Open Mini Non-Thinking, which you'd never heard of before.
Who was even spending the money to benchmark Haiku 3.8 Open Mini Non-Thinking?
You suspect it was one of their competitors trying to make their own models look good in comparison.
If anyone asks, you think it deserves a medium score.
There's alcohol, but it's bottles of rubbing alcohol with not-for-drinking written all over them.
There's music, but it's the Star Spangled Banner again and again on repeat.
You're not sure whether the copies of If Anyone Builds It, Everyone Dies strewn around the room are some kind of subversive decorative theme or just came along with the house.
At least there are people.
Lots of people, actually.
You've never seen so many people at one of these before.