Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
That's where this whole reinforcement learning paradigm.
Right, exactly.
That's where there's sort of a challenge in terms of building those environments.
So there's like 40 startups now in the Bay doing these environments and questionable whether or not any of them will make it or what will happen.
And then these companies are also making their own environments.
But these environments can be anything and everything.
So it's like as simple as like,
Here is a fake Amazon.
Amazon in terms of service ban chat models and all these things.
But here's a fake Amazon full of items.
Figure out how to click around and purchase items.
Figure out how to compare the two items and pick.
I've generated a list of deodorants.
Three of them are fake.
One of them's real.
One of them's not the one I want.
And here's the prompt.
And, you know, it tries many things and, you know, vary the prompt and all these things.
But eventually, you know, it's bought the right deodorant and you've succeeded and you fold it back in.
That's a simple thing.