Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andy Halliday

๐Ÿ‘ค Speaker
3893 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

with a new technology that they've put forward called sparse attention.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

So as you know, in transformer models, the attention mechanism is what makes it possible for you to throw just any kind of text, all these words, many of which are

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

you know, in human language, not that important.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

All the ands, thes, and is, and all those things, you know, those can be ignored to a large degree.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

And the attention mechanisms determine which tokens are actually important to the reasoning task that's there.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

Well, sparse attention uses this concept of, you know, cutting back the total number of tokens that are necessary by selectively

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

Using the ones that have the most import to the inference process.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

So the new sparse attention, deep seek sparse attention, they call it that, is an advanced attention mechanism that selectively prioritizes relevant input data in your prompt.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

So it reduces computational overhead.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

and maintains high accuracy across longer contexts.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

Because obviously, if you can compress to the most relevant using selective sparse attention, then you can put a much larger prompt through that gives you a larger context.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

delivery by the user and then you if you can maintain accuracy and reasoning across that longer context that's improvement in your performance so it allows the mom to process large data sets sufficiently and it allows for larger context but uh they're claiming that they beat gemini 3.0 pro but they didn't beat gemini 3.0 pro thinking

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

Okay, so they're saying, oh, here, our thinking model, the speciality model, beats Gemini 3.0 Pro.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

Well...

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

yeah okay it does beat that marginally but still the state of the art is currently gemini 3.0 pro thinking okay so you give give gemini 3.0 pro just a little more budget uh not not using this sparse attention mechanism that we know of necessarily but you give a little more budget for thinking and it is clearly the winner that way

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

But anyway, I thought it was very interesting, this sparse attention approach.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

It's a concept that's being used in many ways, especially in the context of mixture of experts models, where the sparsity is you're going to activate only those areas of the model first.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

that are relevant experts to the process of inference for the prompt that's been given.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

So making things more efficient, generally speaking, and DeepSeq has moved that forward.

The Daily AI Show
Japan Claims AGI, Pentagon Adopts Gemini, and MIT Designs New Medicines

Not to cut down a deep seek, they're really an amazing open source model and being used widely in enterprise in the United States and around the world as the foundation model for any kind of fine tuning and creation of a customized LLM that's operating in the enterprise context.