In this eye-opening deep dive, we unravel the surprising challenge that’s stumping even the most advanced AI systems today: web browsing. Yes, the thing you do every day—searching the internet—is still one of AI’s biggest hurdles. Enter Browse Comp, a brutal new benchmark created by OpenAI to test whether AI agents can actually navigate the messy, tangled reality of the web to find “needle-in-a-haystack” information.We break down what Browse Comp really is—12,000+ handcrafted, devilishly specific tasks designed to test AI’s persistence, creativity, and judgment online. Think: finding a soccer match with an exact number of yellow cards and a referee from a specific country, or tracking down a research paper by identifying the undergrad schools of its authors. These aren’t trivia questions. They’re research problems.The results? Shocking. Top language models like GPT-4? Scored under 2%. Even with browsing tools. The only model that crossed the human threshold was OpenAI’s specialized “Deep Research” agent—designed specifically for this kind of task—and even it struggled with calibration and overconfidence.We dig into the wild strategies behind the benchmark (like building questions backwards from known answers), the tension between reasoning and searching, and why just giving an AI “access to the internet” is nowhere near enough. And here’s the kicker: humans didn’t do much better, with most giving up after hours of searching.This episode isn’t just about AI benchmarks. It’s about how complex our information landscape really is—and what it says about the future of AI as a research partner. Will we learn to trust AIs to think for us online? Or are we still the best search engines we’ve got?Listen now to explore the future of AI, web literacy, and the art of finding what matters in an overwhelming digital world.
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
Eric Larsen on the emergence and potential of AI in healthcare
10 Dec 2025
McKinsey on Healthcare
Reducing Burnout and Boosting Revenue in ASCs
10 Dec 2025
Becker’s Healthcare -- Spine and Orthopedic Podcast
Dr. Erich G. Anderer, Chief of the Division of Neurosurgery and Surgical Director of Perioperative Services at NYU Langone Hospital–Brooklyn
09 Dec 2025
Becker’s Healthcare -- Spine and Orthopedic Podcast
Dr. Nolan Wessell, Assistant Professor and Well-being Co-Director, Department of Orthopedic Surgery, Division of Spine Surgery, University of Colorado School of Medicine
08 Dec 2025
Becker’s Healthcare -- Spine and Orthopedic Podcast
NPR News: 12-08-2025 2AM EST
08 Dec 2025
NPR News Now
NPR News: 12-08-2025 1AM EST
08 Dec 2025
NPR News Now