Chamath Palihapitiya
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
So if you think about it, you're firing up a very large model to answer a very simple question, you can actually prune away that model.
Now,
This is probably going to be the case in AI applications as it is in traditional Google search.
There's a long tail of searches, but there's a few searches that account for a large percentage of search volume.
It's like, what is the weather?
What are the movies, times?
What's the stock price?
There's a certain set of things
that make up the bulk of consumer energy.
And there's probably a certain set of things that probably make up the bulk of coding output as well.
And so if you can get that 80% of searches or chat interfaces, or coding requests, reduced down through pruning techniques to smaller models, and then you have a whole set of smaller models that can be called dynamically, and you reduce inference costs by 90%.
you can make much more use, call it 10 times the use on data center and energy capacity than we can today.
So I would argue that we're still in the very early days of getting efficiency in terms of output and tokens.
And we're just in the very kind of early stage of that, which also unlocks the opportunity for guys like Elon to reinvent how this is done and potentially compete pretty aggressively.
There are two ways to win.
You could throw compute at it, or you can do SLMs, small language models, and VSLMs, verticalized small language models.
So if you had a verticalized small language model for the weather, let's say, that doesn't exist, but you can use it as an example.
They will have one for travel as an example.
When you hit Google for flight information, it's obviously going to route you to something lighter and faster that uses Google Flights, and Google Flights has been incredibly
incorporated into Gemini.