Trenton Bricken
๐ค SpeakerAppearances Over Time
Podcast Appearances
And even where, like, especially at Spotted, it's like, oh, this is a key part of its persona.
I see this Oxford paper.
What if I change Oxford to Stanford?
What if I now say Richard Feynman really likes this thing?
And it's, like, really carving out the hypothesis space and testing things in a way that I'm kind of surprised by.
Make number go down.
Just flip the sign.
One prediction I have is that we're going to move away from can an agent do XYZ and more towards can I efficiently deploy, launch 100 agents and then give them the feedback they need and even just be able to like easily verify what they're up to, right?
There's this generator verify fire gap that people talk about where it's like much easier to check something than it is to produce the solution on your own.
Yeah.
It's very plausible to me we'll be at the point where it's so easy to generate with these agents that the bottleneck is actually can I as the human verify the answer.
And again, you're guaranteed to get an answer with these things.
And so ideally you have some automated way to evaluate and test a score for like how well it worked, how well did this thing generalize.
And at a minimum, you have a way to easily summarize what a bunch of agents are finding.
And it's like, OK, well, if 20 of my 100 agents all found this one thing, then it has a higher chance of being true.
Yeah, but just to be really concrete or pedantic about the bottlenecks here, a lot of it is again just tooling and are the pipes connected?
Like a lot of things I can't just launch Claude and have it go and solve because maybe it needs a GPU.
or maybe I need very careful permissioning so that it can't just like take over an entire cluster and like launch a whole bunch of things, right?
So you really do need good sandboxing and the ability to use all of the tools that are necessary.
But I think part of it is, is it async or not?