Andy Halliday
π€ SpeakerAppearances Over Time
Podcast Appearances
There are models out there that you can run on a higher class of hardware that have up to a trillion parameters.
a trillion parameters in them uh but typically you might see something there's a 17 billion version and a you know 50 billion version of the model and there's so they they basically collapse and compress the the training into a smaller and smaller model so that it can run more efficiently on a local device without you know burning up the the cpu in that device
Because at present, mobile devices like your iPhone, for example, do not have a GPU built into them that can do the quadratic multiplication that's necessary for model inference.
You can't do that that well.
So it has to run through a CPU.
So it has to be smaller model in order to do that well.
So that's kind of a long-winded introduction to what's happening with these smaller models getting down to the 1 billion class.
But this LFM 2.5 is a 1.2 billion model in the instruct and thinking versions.
And it's going up against the LAMA 3.2 1 billion instruct model.
and the Gemma, that's the Google Gemma 3, so Gemma, the distillation of the Gemini 3 model, at $1 billion, instruction-tuned, and up against the Quen 3, that's Alibaba's model,
which is a 1.7 billion parameter.
So it has a little bit of an advantage because it's going past the 1 billion level that GEMMA, LAMA, and LFM is leaking over at 1.2 billion.
But QUEN3 has 1.7 billion parameters in it.
But each of these is small enough to run.
And for example, they say that LFM 2.5 offers extremely fast inference speed
on CPUs.
So not GPUs, but on CPUs with a low memory profile compared to the similar sized models, Gemma and Llama and Quint.
Okay, so now I'm going to show you what this new model does compared to those on two very important benchmarks.
The first is called GPQA, which is the graduate level Google proof Q&A benchmark.
So it's a set of 448 multiple choice questions written by domain experts in biology, physics, and chemistry.