Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

Adaptive Compression Techniques for Efficient LLM Inference

20 Sep 2025

Description

These 14 research papers provide an overview of various **compression techniques for Large Language Models (LLMs)**, primarily focusing on **reducing the size and computational overhead of the Key-Value (KV) cache** to handle long contexts more efficiently. Several novel methods are detailed, including **GVote**, an adaptive compression algorithm using query sampling and voting to find an optimal cache budget, and **SnapKV**, which selects clustered, important KV positions based on an "observation" window to maintain performance while increasing speed and memory efficiency. Other approaches include **POD (Proximal tokens over Distant tokens)**, which reduces redundancy by sharing key states across layers for distant tokens while preserving proximal ones, and **DecoQuant**, a quantization method utilizing matrix decomposition to reduce errors. The sources also examine **prompt compression methods** like **LLMLingua** and **LongLLMLingua**, and describe **CASC (Context-Adaptive Synthesis and Compression)**, a Retrieval-Augmented Generation (RAG) framework that intelligently synthesizes and compresses multi-document contexts to improve answer accuracy in complex domains.Sources:https://arxiv.org/pdf/2509.08315https://arxiv.org/html/2509.09199v1https://arxiv.org/html/2509.03136v1https://aclanthology.org/2025.acl-long.1394.pdfhttps://proceedings.neurips.cc/paper_files/paper/2024/file/fd0705710bf01b88a60a3d479ea341d9-Paper-Conference.pdfhttps://arxiv.org/html/2412.14838v1https://arxiv.org/pdf/2412.02252https://aclanthology.org/2024.acl-long.133.pdfhttps://arxiv.org/html/2508.19357v1https://aclanthology.org/2024.acl-long.91.pdfhttps://arxiv.org/html/2310.05736v2https://aclanthology.org/2025.naacl-long.368.pdfhttps://arxiv.org/pdf/2404.14469https://aclanthology.org/2024.findings-emnlp.266.pdf

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.