Zaid
π€ SpeakerAppearances Over Time
Podcast Appearances
Inference is also what powers AI agents to execute their tasks.
A very simplified analogy that I can think of is AI training is like someone going to business school.
And then AI inference is like hiring that person to do a job and answer questions.
And this shift to inference is impacting the AI infrastructure spending.
According to research from Gartner, global capital spending on inference infrastructure is going to surpass spending on training for the first time ever this year.
And this gap is expected to widen fast.
By 2029, companies are projected to spend nearly twice as much on inference compared to training.
The thing is, these AI models are now so good that there's less urgency to train new ones.
Instead, there's more focus on getting the most out of these existing AI models.
Now, the other side effect of the rise of agentic AI and the focus on inference is that this is gonna exponentially increase the demand for compute.
So if you think about it, a normal AI chatbot gives you one response, but an AI agent needs to think through the tasks, call multiple tools, check its own work, retry if something fails, pull in outside data, and keep running in the background to complete the work.
All that requires a lot more tokens and a lot more computing power.
And it also requires a different type of hardware.
Most people know about Nvidia's GPUs.
They're really good at training AI models.
Well, inference and agentic workloads are different.
That requires CPUs working alongside GPUs to get the maximum performance.
The CPU acts as a brain that can think and execute multi-step tasks.
It can call external tools like APIs and databases and manage memory better.
And that's why Nvidia is now making CPUs themselves alongside their GPUs.