Michael Truell
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so it's also just the public aspect of these things is tricky.
And so it's also just the public aspect of these things is tricky.
I think that given the dearths and benchmarks, there have been a few interesting crutches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, people will actually just have humans play with the things and give qualitative feedback on these things.
I think that given the dearths and benchmarks, there have been a few interesting crutches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, people will actually just have humans play with the things and give qualitative feedback on these things.
I think that given the dearths and benchmarks, there have been a few interesting crutches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, people will actually just have humans play with the things and give qualitative feedback on these things.
like one or two of the foundation model companies, they have people who, that's a big part of their role. And, you know, internally we also, you know, qualitatively assess these models and actually lean on that a lot in addition to like private evals that we have. It's like the vibe. The vibe, yeah. It's like the vibe.
like one or two of the foundation model companies, they have people who, that's a big part of their role. And, you know, internally we also, you know, qualitatively assess these models and actually lean on that a lot in addition to like private evals that we have. It's like the vibe. The vibe, yeah. It's like the vibe.
like one or two of the foundation model companies, they have people who, that's a big part of their role. And, you know, internally we also, you know, qualitatively assess these models and actually lean on that a lot in addition to like private evals that we have. It's like the vibe. The vibe, yeah. It's like the vibe.
Don't you think this gets at a little bit some of the stuff you were talking about earlier with the difficulty of specifying intent for what you want with software? Where sometimes it might be because the intent is really hard to specify, it's also then going to be really hard to prove that it's actually matching whatever your intent is.
Don't you think this gets at a little bit some of the stuff you were talking about earlier with the difficulty of specifying intent for what you want with software? Where sometimes it might be because the intent is really hard to specify, it's also then going to be really hard to prove that it's actually matching whatever your intent is.
Don't you think this gets at a little bit some of the stuff you were talking about earlier with the difficulty of specifying intent for what you want with software? Where sometimes it might be because the intent is really hard to specify, it's also then going to be really hard to prove that it's actually matching whatever your intent is.
Yeah, or just for a given spec, maybe you can... I think there is a question of can you actually do the formal verification? Is that possible? I think that there's more to dig into there.
Yeah, or just for a given spec, maybe you can... I think there is a question of can you actually do the formal verification? Is that possible? I think that there's more to dig into there.
Yeah, or just for a given spec, maybe you can... I think there is a question of can you actually do the formal verification? Is that possible? I think that there's more to dig into there.
Even if you have the spec. Is the spec written in natural language? Yeah, how do you map the spec?
Even if you have the spec. Is the spec written in natural language? Yeah, how do you map the spec?
Even if you have the spec. Is the spec written in natural language? Yeah, how do you map the spec?
But how easy would that be to draw? So then I think that you care about things that are not going to be easily well-specified in the spec language.
But how easy would that be to draw? So then I think that you care about things that are not going to be easily well-specified in the spec language.
But how easy would that be to draw? So then I think that you care about things that are not going to be easily well-specified in the spec language.