Simon Willison
๐ค SpeakerAppearances Over Time
Podcast Appearances
That does also tie into the O1, these new inference scaling language models that we're getting. The one that did well in the AGI test, O3, That was basically brute force, right? It tries loads and loads and loads and loads of different potential strategies, solving a puzzle, and it figures out which one works, and it spends a million dollars on electricity to do it.
That does also tie into the O1, these new inference scaling language models that we're getting. The one that did well in the AGI test, O3, That was basically brute force, right? It tries loads and loads and loads and loads of different potential strategies, solving a puzzle, and it figures out which one works, and it spends a million dollars on electricity to do it.
But it did kind of work, you know?
But it did kind of work, you know?
Okay. I've got one thing I do want to recommend for test time compute. I've been calling it inference scaling. It's the same idea. There is a Alibaba model from Quen, their Quen research team, called QWQ, which you can run on your laptop. I've run it on my Mac, and it does the thing.
Okay. I've got one thing I do want to recommend for test time compute. I've been calling it inference scaling. It's the same idea. There is a Alibaba model from Quen, their Quen research team, called QWQ, which you can run on your laptop. I've run it on my Mac, and it does the thing.
It does the give it a puzzle, and it thinks a very... It outputs, like, sometimes dozens of paragraphs of text about how it's thinking before it gets to an answer. And so watching it do that is... incredibly entertaining. But the best thing about it is that occasionally it switches into Chinese. I've had my laptop think out loud in Chinese before it got to an answer.
It does the give it a puzzle, and it thinks a very... It outputs, like, sometimes dozens of paragraphs of text about how it's thinking before it gets to an answer. And so watching it do that is... incredibly entertaining. But the best thing about it is that occasionally it switches into Chinese. I've had my laptop think out loud in Chinese before it got to an answer.
So I asked it a question in England, it thought in Chinese for quite a while, and then it gave me an English answer. And that is just delightful.
So I asked it a question in England, it thought in Chinese for quite a while, and then it gave me an English answer. And that is just delightful.
Right. So what's not to love about seeing your laptop just do that on its own?
Right. So what's not to love about seeing your laptop just do that on its own?
It is scoring higher than any of the other open weights models. It is also, it's like 685 billion parameters, so it's not easy to run. This needs... data center hardware to run it. But yeah, the benchmarks are all very impressive. It's beating, the previous best one I think was Neta's Lama 405B. This one's what, 685B or something? It's very good.
It is scoring higher than any of the other open weights models. It is also, it's like 685 billion parameters, so it's not easy to run. This needs... data center hardware to run it. But yeah, the benchmarks are all very impressive. It's beating, the previous best one I think was Neta's Lama 405B. This one's what, 685B or something? It's very good.
The thing that shocks, because DeepSeek have a good reputation. They've released some good models in the past. The fact that they did it for $5.5 million, that's like an 11th of the price of the closest Meta model that Meta have documented their spending on. It's just astonishing. Yeah.
The thing that shocks, because DeepSeek have a good reputation. They've released some good models in the past. The fact that they did it for $5.5 million, that's like an 11th of the price of the closest Meta model that Meta have documented their spending on. It's just astonishing. Yeah.
I mean, one thing I do want to highlight is that last year was the year of inference compute efficiency. Like at the beginning of the year, we had like the open AI models were about literally 100 times less expensive to run a prompt through than they were two and a half years ago.
I mean, one thing I do want to highlight is that last year was the year of inference compute efficiency. Like at the beginning of the year, we had like the open AI models were about literally 100 times less expensive to run a prompt through than they were two and a half years ago.
Like, all of the providers, they're in this race to the bottom in terms of how much they charge per token, but it's a race based on efficiency. Like, I checked in Google Gemini and Amazon Nova are both the cheapest hosted models, or two of the cheapest, and they're not doing it a loss. They are at least charging you more than it costs them in electricity to run your prompt.
Like, all of the providers, they're in this race to the bottom in terms of how much they charge per token, but it's a race based on efficiency. Like, I checked in Google Gemini and Amazon Nova are both the cheapest hosted models, or two of the cheapest, and they're not doing it a loss. They are at least charging you more than it costs them in electricity to run your prompt.