Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

EA Forum Podcast (All audio)

[Linkpost] “Q2 AI Benchmark Results: Pros Maintain Clear Lead” by Benjamin Wilson 🔸, johnbash, Metaculus

28 Oct 2025

Description

This is a link post. By Ben Wilson and John Bash from Metaculus Main Takeaways Top Findings Pro forecasters significantly outperform bots: Our team of 10 Metaculus Pro Forecasters demonstrated superior performance compared to the top-10 bot team, with strong statistical significance (p = 0.00001) based on a one-sided t-test on Peer scores. The bot team did not improve significantly in Q2 relative to the human Pro team: The bot team's head-to-head score against Pros was -11.3 in Q3 2024 (95% CI: [-21.8, -0.7]), then -8.9 in Q4 2024 (95% CI: [-18.8, 1]), then -17.7 in Q1 2025 (95% CI: [-28.3, -7.0]), and now -20.03 [-28.63, -11.41] with no clear trend emerging. (Reminder: a lower head-to-head score indicates worse relative accuracy. A score of 0 corresponds to equal accuracy.) Other Takeaways This quarter's winning bot is open-source: Q2 Winner Panshul has very generously made his bot open-source. The bot writes separate “outside view” and “inside view” [...] ---Outline:(00:20) Main Takeaways(03:24) Introduction(04:30) Methodology(13:59) How do LLMs Compare?(17:18) Which Bot Strategy is Best?(23:04) Are Bots Better than Human Pros?(25:38) Binary vs Numeric vs Multiple Choice Questions(27:07) Team Performance Over Quarters(31:14) Bot Maker Survey(31:40) Best practices of the best-performing bots(38:27) Other Survey Results(41:32) How did scaffolding do?(45:33) Advice from Bot Makers(53:48) Links to Code and Data(54:56) Future AI Benchmarking Tournaments --- First published: October 28th, 2025 Source: https://forum.effectivealtruism.org/posts/F2stjK9wHSy3HPEC9/q2-ai-benchmark-results-pros-maintain-clear-lead Linkpost URL:https://www.metaculus.com/notebooks/40456/q2-ai-benchmark-results/ --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.