Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
You'll just feed the audio and or the video with audio directly to the model and then have it processed that way.
If it needs to capture a full transcript, great.
The model can decide to do that because it understands what's being said.
But we may bypass that step, especially as voice interaction with AI becomes more and more prevalent among users.
Now, question.
Clearly, timestamp accuracy is really important for retrieval and presentation or extraction of a segment from a video, a long video.
And you're saying that when you when you do this sort of multistage process, that gives you more accurate timestamps.
Is that the main benefit of that?
That's not the clip.
Yeah, so what I'm very interested in is this new paper from DeepMind called Patchwork AGI.
And a lot of newsletters and commentators responded to this patchwork AGI thing because DeepMind's focus on this was that patchwork AGI is going to emerge before a single model achieves AGI.
So it's no longer a race just among the major frontier model providers.
It's who's going to create the harness and the
sort of a collection of agents that collectively demonstrate artificial general intelligence.
You know, it's, it's no longer, you know, just the domain of the frontier model trainers.
It's there are others who will be able to achieve this AGI and,
And I'm going to harken back to our discussions last week about a small startup called Poetic, P-O-E-T-I-Q, that aced way beyond what the frontier models have achieved on the ARC-AGI-2 benchmark.