Mustafa Suleyman
๐ค SpeakerAppearances Over Time
Podcast Appearances
And prices have come down 100x on inference in the last two years, which is a crazy thought, the cost of producing a single token.
And that's a combination of all the chips and the stack, the data centers, but also just all of the algorithmic improvements and stuff.
So I think
you know naturally with any exponential technology and this is probably the most exponential technology you've ever seen is this compound effect of all these components which reduce the cost accelerate the proliferation then the use cases go through the roof because people are randomly discovering things that no one ever even conceived of when they were originally technologies were originally designed and then you get that feedback loop that then drives the next round of efficiencies and you know like i mean we just um i'm very excited about our new chip maya
Microsoft AI Accelerator, M-A-I-A.
And so this is a competitor to the NVIDIA Blackwell chip or the Google TPU or the Amazon Tranium.
And it's been specifically designed for inference.
Like we realized a few years ago that it was critical to focus on serving rather than training, which is quite a different workload.
And it's a phenomenal chip.
I mean, it's 30% more performant than any other chip that we've got in our data centers today made by any provider for these serving workloads.
It's 3x better than the Amazon Tranium chip.
It's a little bit better than the Google TPU V7 for FP4.
So it's just a huge deal.
And it's just kind of awesome to see that the cost curve is just going to continue coming down.
I would say social intelligence.
So we've had IQ, EQ, and AQ.
Last year was AQ, the previous years were IQ and EQ.
And AQ is agentic intelligence or?
Actions.
I basically made it up a few years ago as a framework.