Simon Willison

Predictions 2025

That does also tie into the O1, these new inference scaling language models that we're getting. The one that did well in the AGI test, O3, That was basically brute force, right? It tries loads and loads and loads and loads of different potential strategies, solving a puzzle, and it figures out which one works, and it spends a million dollars on electricity to do it.

1626.396 View full episode →

Oxide and Friends

Predictions 2025

That does also tie into the O1, these new inference scaling language models that we're getting. The one that did well in the AGI test, O3, That was basically brute force, right? It tries loads and loads and loads and loads of different potential strategies, solving a puzzle, and it figures out which one works, and it spends a million dollars on electricity to do it.

1626.396 View full episode →

Oxide and Friends

Predictions 2025

But it did kind of work, you know?

1649.943 View full episode →

Oxide and Friends

Predictions 2025

But it did kind of work, you know?

1649.943 View full episode →

Oxide and Friends

Predictions 2025

Okay. I've got one thing I do want to recommend for test time compute. I've been calling it inference scaling. It's the same idea. There is a Alibaba model from Quen, their Quen research team, called QWQ, which you can run on your laptop. I've run it on my Mac, and it does the thing.

1696.093 View full episode →

Oxide and Friends

Predictions 2025

Okay. I've got one thing I do want to recommend for test time compute. I've been calling it inference scaling. It's the same idea. There is a Alibaba model from Quen, their Quen research team, called QWQ, which you can run on your laptop. I've run it on my Mac, and it does the thing.

1696.093 View full episode →

Oxide and Friends

Predictions 2025

It does the give it a puzzle, and it thinks a very... It outputs, like, sometimes dozens of paragraphs of text about how it's thinking before it gets to an answer. And so watching it do that is... incredibly entertaining. But the best thing about it is that occasionally it switches into Chinese. I've had my laptop think out loud in Chinese before it got to an answer.

1712.908 View full episode →

Oxide and Friends

Predictions 2025

It does the give it a puzzle, and it thinks a very... It outputs, like, sometimes dozens of paragraphs of text about how it's thinking before it gets to an answer. And so watching it do that is... incredibly entertaining. But the best thing about it is that occasionally it switches into Chinese. I've had my laptop think out loud in Chinese before it got to an answer.

1712.908 View full episode →

Oxide and Friends

Predictions 2025

So I asked it a question in England, it thought in Chinese for quite a while, and then it gave me an English answer. And that is just delightful.

1733.579 View full episode →

Oxide and Friends

Predictions 2025

So I asked it a question in England, it thought in Chinese for quite a while, and then it gave me an English answer. And that is just delightful.

1733.579 View full episode →

Oxide and Friends

Predictions 2025

Right. So what's not to love about seeing your laptop just do that on its own?

1749.424 View full episode →

Oxide and Friends

Predictions 2025

Right. So what's not to love about seeing your laptop just do that on its own?

1749.424 View full episode →

Oxide and Friends

Predictions 2025

It is scoring higher than any of the other open weights models. It is also, it's like 685 billion parameters, so it's not easy to run. This needs... data center hardware to run it. But yeah, the benchmarks are all very impressive. It's beating, the previous best one I think was Neta's Lama 405B. This one's what, 685B or something? It's very good.

1837.485 View full episode →

Oxide and Friends

Predictions 2025

It is scoring higher than any of the other open weights models. It is also, it's like 685 billion parameters, so it's not easy to run. This needs... data center hardware to run it. But yeah, the benchmarks are all very impressive. It's beating, the previous best one I think was Neta's Lama 405B. This one's what, 685B or something? It's very good.

1837.485 View full episode →

Oxide and Friends

Predictions 2025

The thing that shocks, because DeepSeek have a good reputation. They've released some good models in the past. The fact that they did it for $5.5 million, that's like an 11th of the price of the closest Meta model that Meta have documented their spending on. It's just astonishing. Yeah.

1891.461 View full episode →

Oxide and Friends

Predictions 2025

The thing that shocks, because DeepSeek have a good reputation. They've released some good models in the past. The fact that they did it for $5.5 million, that's like an 11th of the price of the closest Meta model that Meta have documented their spending on. It's just astonishing. Yeah.

1891.461 View full episode →

Oxide and Friends

Predictions 2025

I mean, one thing I do want to highlight is that last year was the year of inference compute efficiency. Like at the beginning of the year, we had like the open AI models were about literally 100 times less expensive to run a prompt through than they were two and a half years ago.

1957.548 View full episode →

Oxide and Friends

Predictions 2025

I mean, one thing I do want to highlight is that last year was the year of inference compute efficiency. Like at the beginning of the year, we had like the open AI models were about literally 100 times less expensive to run a prompt through than they were two and a half years ago.

1957.548 View full episode →

Oxide and Friends

Predictions 2025

Like, all of the providers, they're in this race to the bottom in terms of how much they charge per token, but it's a race based on efficiency. Like, I checked in Google Gemini and Amazon Nova are both the cheapest hosted models, or two of the cheapest, and they're not doing it a loss. They are at least charging you more than it costs them in electricity to run your prompt.

1974.992 View full episode →

Oxide and Friends

Predictions 2025

Like, all of the providers, they're in this race to the bottom in terms of how much they charge per token, but it's a race based on efficiency. Like, I checked in Google Gemini and Amazon Nova are both the cheapest hosted models, or two of the cheapest, and they're not doing it a loss. They are at least charging you more than it costs them in electricity to run your prompt.

1974.992 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment