Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Podcast

XQuant:突破大型语言模型推理的内存瓶颈

10 Sep 2025

Description

本期节目我们将深入探讨XQuant,一项通过巧妙利用计算能力超越内存限制的创新技术。它如何通过量化输入激活X而非KV缓存,实现高达12.5倍的内存节省,同时保持接近FP16的精度,为LLM推理带来革命性变革?我们还将揭示XQuant-CL如何利用跨层相似性,以及如何支持GQA模型,共同探讨这项面向未来的技术如何加速大模型应用!

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.