Dr. Alexander Wissner-Gross
👤 SpeakerAppearances Over Time
Podcast Appearances
We should be in the hundreds of trillions or higher of parameters right now.
That hasn't happened.
We've plateaued out in terms of the number of parameters and frontier models, and that's in part due to the reasoning model revolution and in part due to distillation, which go hand in hand.
So those are some preliminary thoughts.
I would suspect it's sort of interesting to me that he hasn't yet merged video generation with all of the other models.
Google DeepMind has made lots of noises about starting to merge video as a first class modality in with their multimodal reasoning models.
Again, don't have access to the ground truth for how capable Gemini general purpose models are at video generation.
We've seen, obviously,
Google's video generation models have been kept distinct from a user interface perspective, presumably they're diffusion transformer-based rather than transformer-based.
We don't know.
Punchline, I would say that this seems like a healthful family for SpaceX AI, the newly emerged entity, to be offering, but there really aren't any big shockers in terms of
The range is other than maybe that they've abandoned the low end.
Google is very much tending to small parameter count, sub trillion in a few cases.
Google is releasing via the Gemma models, few billion parameter models.
Elon has completely abandoned the low end in favor of brute force scaling, which is exactly what I'd expect from him anyway.
Well, remember, the way reasoning models are trained these days is usually, at least according to my understanding from all of the other frontier labs, you train the largest model you possibly can, and then you distill it down to smaller models.
So it's not as if necessarily the 10T model even needs to be released.
It might be for the purpose of serving as a teacher model that can then be distilled down to more releasable models.
It's a management style.
Some would say management by crisis.