PageANN: Scalable Disk ANNS with Page-Aligned Graphs
07 Dec 2025
Contributed by Lukas
The research paper presents PageANN, a novel framework engineered to overcome the severe latency and...
NeurIPS 2025: Homogeneous Keys, Heterogeneous Values
04 Dec 2025
Contributed by Lukas
This research presents a novel method for efficient long-context modeling in Large Language Models (...
NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
29 Nov 2025
Contributed by Lukas
The research systematically investigates the effects of integrating various gating mechanisms into t...
NeurIPS 2025: Large Language Diffusion Models
29 Nov 2025
Contributed by Lukas
This research paper introduces LLaDA, an 8-billion parameter language model based on the masked diff...
NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example
29 Nov 2025
Contributed by Lukas
This research examines the data efficiency of Reinforcement Learning with Verifiable Reward (RLVR) w...
NeurIPS 2025: Parallel Scaling Law for Language Models
29 Nov 2025
Contributed by Lukas
The research proposes Parallel Scaling (PARSCALE) as a novel, efficient strategy to enhance Large La...
NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
29 Nov 2025
Contributed by Lukas
The academic paper introduces Self-play Reinforcement Learning (SeRL), a framework engineered to enh...
NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces
29 Nov 2025
Contributed by Lukas
The provided text outlines DYNAACT, a new framework intended to enhance sequential reasoning in Larg...
NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
29 Nov 2025
Contributed by Lukas
The academic paper introduces KGGen, a novel text-to-knowledge-graph generator designed to overcome ...
NeurIPS 2025: Self-Adapting Language Models
29 Nov 2025
Contributed by Lukas
The academic paper presents the Self-Adapting LLM (SEAL) framework, designed to allow large language...
NeurIPS 2025: Thinkless: LLM Learns When to Think
29 Nov 2025
Contributed by Lukas
The research introduces Thinkless, a framework designed to solve the computational inefficiency of L...
NeurIPS 2025: FlashBias: Fast Computation of Attention with Bias
29 Nov 2025
Contributed by Lukas
The source introduces FlashBias, an innovative algorithm designed to significantly accelerate the ef...
NeurIPS 2025: A-Mem: Agentic Memory for LLM Agents
29 Nov 2025
Contributed by Lukas
The source details the creation and evaluation of Agentic Memory (A-MEM), a novel memory system for ...
NeurIPS 2025: MoBA: Mixture of Block Attention for Long-Context LLMs
29 Nov 2025
Contributed by Lukas
This paper introduces Mixture of Block Attention (MoBA) to address the prohibitive quadratic computa...
NeurIPS 2025: Reward Reasoning Model
29 Nov 2025
Contributed by Lukas
The source details the development and evaluation of Reward Reasoning Models (RRMs), which are desig...
Anthropic: Disrupting the First AI-Orchestrated Cyber Espionage Campaign
27 Nov 2025
Contributed by Lukas
Anthropic released a detailed report outlining the detection and disruption of an advanced cyber esp...
Anthropic: reward hacking & misalignment & sabotage
22 Nov 2025
Contributed by Lukas
Anthropic’s research details how **realistic AI training processes can inadvertently create misali...
DeepSeek-OCR: Contexts Optical Compression
22 Nov 2025
Contributed by Lukas
The October 21, 2025 Deepseek paper introduces **DeepSeek-OCR**, a Vision-Language Model (VLM) desig...
Neuromorphic computing: Brain-Inspired AI and Hardware
22 Nov 2025
Contributed by Lukas
These sources provide a comprehensive overview of **neuromorphic computing (NC)**, focusing heavily ...
Meta: SAM 3
20 Nov 2025
Contributed by Lukas
This Meta November 18 2025 paper details the development, training, and evaluation of **Segment Anyt...
Mamba-360: State Space Models for Long Sequence Modeling
19 Nov 2025
Contributed by Lukas
The April 24, 2024 paper provides a comprehensive **survey of State Space Models (SSMs)**, outlining...
Mixture-of-Depths: Dynamic Compute Allocation in Transformers
19 Nov 2025
Contributed by Lukas
These April 4, 2024 Google Deepmind paper introduces the **Mixture-of-Depths (MoD)** transformer arc...
MLP Mixer Models
19 Nov 2025
Contributed by Lukas
These sources collectively explore the **MLP-Mixer architecture** and its numerous extensions across...
Marin: Open LLM Optimization & Diagnostics
19 Nov 2025
Contributed by Lukas
Marin is an open lab dedicated to the transparent research and development of foundation models (FMs...
vAttention Vs Strata: advanced GPU memory management
19 Nov 2025
Contributed by Lukas
We compare and contrast two advanced 2025 memory management and scheduling techniques for optimizing...
AMD: Instella: Fully Open Language Models with Stellar Performance
16 Nov 2025
Contributed by Lukas
The November 13, 2025 paper by AMD introducs **Instella**, a new family of **fully open-source** thr...
Mechanistic interpretability: Decoding the AI's Inner Logic: Circuits and Sparse Features
15 Nov 2025
Contributed by Lukas
Ten different sources are used in this episode which are excerpts from academic papers and technical...
Spectral Gap: Analysis of Attention Layers and Graph Transformers
10 Nov 2025
Contributed by Lukas
We review two papers on Spectral Gap, one 2021 and another from 2025. The first source presents the ...
CARTRIDGE: Efficient In-Context Learning via Distillation
10 Nov 2025
Contributed by Lukas
The June 13, 2025 joint collaboration between Stanford University, Caltech and University at Buffalo...
Metacognition and Skill Discovery in LLM Math Reasoning
10 Nov 2025
Contributed by Lukas
The May 20, 2024 academic paper explores the **metacognitive capabilities of Large Language Models (...
Context Distillation for Language Models
10 Nov 2025
Contributed by Lukas
These five papers from 2022 up to 2025 discuss various **knowledge distillation techniques** aimed a...
Tempo: SLO-Aware LLM Serving Maximizing Service Gain
10 Nov 2025
Contributed by Lukas
The April 24, 2025 academic paper introduces **Tempo**, a novel scheduling system designed to optimi...
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow
10 Nov 2025
Contributed by Lukas
The January 30, 2025 paper introduces **LLM-AutoDiff**, a novel framework for **Automatic Prompt Eng...
Confucius: Intent-Driven Network Management with Multi-Agent LLMs
10 Nov 2025
Contributed by Lukas
The August 27, 2025 paper introduces **Confucius**, a novel multi-agent Large Language Model (LLM) f...
SYMPHONY: Memory Management for LLM Multi-Turn Inference
10 Nov 2025
Contributed by Lukas
The 2024 paper introduces **SYMPHONY**, a novel system designed to improve memory management and sch...
DSPy and TextGrad: Compiling Language Model Systems
10 Nov 2025
Contributed by Lukas
These two academic papers introduce novel programming models aimed at systematically optimizing comp...
Vidur: Simulation for Efficient LLM Inference Deployment
10 Nov 2025
Contributed by Lukas
The May 21, 2024 paper introduces **Vidur**, a new, high-fidelity simulation framework designed to o...
Continuous Autoregressive Language Models: CALM
10 Nov 2025
Contributed by Lukas
The October 31, 2025 paper introduces **Continuous Autoregressive Language Models (CALM)**, a new pa...
A Framework for LLM Application Safety Evaluation
10 Nov 2025
Contributed by Lukas
The July 13, 2025 paper " Measuring What Matters: A Framework for Evaluating Safety Risks in Real-Wo...
Doubly Stochastic Attention for Transformers
10 Nov 2025
Contributed by Lukas
The four papers we review dated from 1967 up to two papers in 2025 collectively discuss the mathemat...
Random Walk Methods for Graph Learning and Networks
10 Nov 2025
Contributed by Lukas
We provide a review of the evolution of value of Page Rank to Random Walk with Random Restart and it...
AlphaEvolve: Mathematical Discovery at Scale
10 Nov 2025
Contributed by Lukas
The November 3, 2025 paper provide an overview of the **AlphaEvolve** system, an AI-powered evolutio...
AdaFlow: Variance-Adaptive Flow-Based Imitation Learning
10 Nov 2025
Contributed by Lukas
The November 22, 2024 paper from UT Texas introduces **AdaFlow**, a novel imitation learning framewo...
zFLoRA: Zero-Latency Fused Low-Rank Adapters
04 Nov 2025
Contributed by Lukas
The October 28, 2025 Samsung research paper introduces **zFLoRA (zero-latency fused low-rank adapter...
SuperBPE: Space Travel for Language Models
04 Nov 2025
Contributed by Lukas
The August 26, 2025 collaboration between the University of Washington, NVIDIA and the Allen Institu...
Google: Supervised Reinforcement Learning for Step-wise Reasoning in LLMs
04 Nov 2025
Contributed by Lukas
The October 29 2025 Google research paper introduces **Supervised Reinforcement Learning (SRL)**, a ...
MorphKV: Constant-Sized KV Caches for LLM Inference
04 Nov 2025
Contributed by Lukas
The June 7, 2025 UT Austin and University of British Colombia collaboration academic paper introduce...
HALoS: Hierarchical Asynchronous LLM Training over Slow Networks
04 Nov 2025
Contributed by Lukas
The June 5, 2025 research paper introducing **HALoS: Hierarchical Asynchronous Local SGD**, a novel ...
Anchored Diffusion Language Model: Superior Generation and Reasoning
04 Nov 2025
Contributed by Lukas
The May 24, 2025 UT Austin paper introduces the **Anchored Diffusion Language Model (ADLM)**, a nove...
Gumbel-Softmax for Differentiable Categorical Reparameterization and Selective Networks
04 Nov 2025
Contributed by Lukas
These two papers (years 2017, 2022) introduce and then apply the **Gumbel-Softmax distribution** as ...
PolicySmith: Automated Systems Heuristic Generation via LLMs
04 Nov 2025
Contributed by Lukas
The October 9, 2025 paper from UT Austin paper introduces **PolicySmith**, a novel framework that au...
RetNet: Retentive Networks: Transformer Successor for Large Language Models
02 Nov 2025
Contributed by Lukas
The August 9, 2023 paper introduces the **Retentive Network (RetNet)**, a proposed foundational arch...
Kimi Linear: Efficient Expressive Attention Architecture
02 Nov 2025
Contributed by Lukas
The October 30, 2025 **technical report** details the development and evaluation of **Kimi Linear**,...
ALiBi: Attention with Linear Biases Enables Length Extrapolation
01 Nov 2025
Contributed by Lukas
The April 22, 2022 collaboration between University of Washington, Facebook AI and the Allen Institu...
Quest: Query-Aware Sparsity for Efficient LLM Inference
31 Oct 2025
Contributed by Lukas
The August 26, 2024 academic paper introduces **Quest**, a novel algorithm designed to improve the i...
Flash-LLM: Efficient LLM Inference with Unstructured Sparsity on Tensor Cores
31 Oct 2025
Contributed by Lukas
The September 19, 2025 Alibaba paper introduces **Flash-LLM**, a novel software framework designed t...
ELASTIC: Linear Attention for Sequential Interest Compression
31 Oct 2025
Contributed by Lukas
The February 12, 2025 KuaiShou Inc paper introduces **ELASTIC**, an Efficient Linear Attention for S...
Anthropic: Introspective Awareness in LLMs
31 Oct 2025
Contributed by Lukas
On October 29, 2025 Anthropic presented research investigating the existence of **functional introsp...
Small Versus Large Models for Requirements Classification
31 Oct 2025
Contributed by Lukas
The October 24, 2025 collaboration between many universities have published a paper thst compares th...
Hyper-Scaling LLM Inference with KV Cache Compression
31 Oct 2025
Contributed by Lukas
The June 5, 2025 collaboration between University of Edinburgh and Nvidia paper introduces the conce...
Architectural Scaling Laws for Efficient LLMs
31 Oct 2025
Contributed by Lukas
The October 21, 2025 collaboration paper between UW-Madison and Amazon Web Services discuss the crit...
ATTENTION2D and lean attention: Distributed Self-Attention
29 Oct 2025
Contributed by Lukas
We cover two new innovations from Microsoft extending ideas from the original old **FlashAttention**...
Sentence-BERT: Siamese Networks for Sentence Embeddings
29 Oct 2025
Contributed by Lukas
The provided text introduces **Sentence-BERT (SBERT)**, a modification of the popular **BERT** and *...
TxGNN: Foundation Model for Zero-Shot Drug Repurposing
29 Oct 2025
Contributed by Lukas
The source provides excerpts from a scientific paper introducing **TxGNN**, a novel graph foundation...
STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency
26 Oct 2025
Contributed by Lukas
These April 29, 2024 paper provides an overview of the challenges associated with using **NVIDIA's M...
Strata: Efficient Hierarchical Context Caching for LLM Serving
26 Oct 2025
Contributed by Lukas
The August 26, 2025 collaboration between Stanford, NVIDIA, Shanghai Jiao Tong University, Universit...
FlashAttention: IO-Aware Fast and Memory-Efficient Attention
26 Oct 2025
Contributed by Lukas
This is a classic review of a now old but yet still important paper, the original Flash Attention pa...
Introducing MTEB v2: Multimodal Embedding Evaluation
26 Oct 2025
Contributed by Lukas
On October 20, 2025 Hugging Face released **MTEB v2**, a significant refactoring of the Massive Text...
Structural Understanding of LLM Overthinking
26 Oct 2025
Contributed by Lukas
The October 10, 2025 paper from the University of Michigan and **Google DeepMind** concerning the ph...
Stuck in the Matrix: LLM Spatial Reasoning
26 Oct 2025
Contributed by Lukas
The October 23 2025 research paper **probes the spatial reasoning capabilities of Large Language Mod...
LLM-Empowered Knowledge Graph Construction: A Survey
26 Oct 2025
Contributed by Lukas
This October 23, 2025 Xidian University academic survey systematically reviews the transformative im...
Survey of Emerging Topics in AI and Robotics
26 Oct 2025
Contributed by Lukas
The October 23, 2025 collaboration between UC San Diego , NVIDIA , META , UW-Madison , and UNC intro...
The Free Transformer: VAE Extension for Decoders
26 Oct 2025
Contributed by Lukas
The October 20, 2025 Meta FAIR paper introduces the **Free Transformer**, an innovative extension of...
LithOS: Operating System for Efficient GPU Machine Learning
26 Oct 2025
Contributed by Lukas
This 2025 CMU paper introduces **LithOS**, a novel operating system designed to improve the efficien...
Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning
26 Oct 2025
Contributed by Lukas
This October 23, 2025 technical report from the Ling Team introduces the **Ring-linear model series*...
GigaBrain-0: World Model-Powered Generalist Robots
26 Oct 2025
Contributed by Lukas
The October 22, 2025 GigaAI paper introduces **GigaBrain-0**, a novel Vision-Language-Action (VLA) m...
Open-o3 Video: Spatio-Temporal Grounded Reasoning
26 Oct 2025
Contributed by Lukas
The October 25, 2025 Bytedance paper introduces **Open-o3 Video**, a novel framework developed by re...
Cattell–Horn–Carroll Theory of Intelligence
26 Oct 2025
Contributed by Lukas
We review the Cattell-Horn-Carroll (CHC) used in recent AI papers on the definition of what AGI coul...
Internal Mechanisms of a Large Language Model
26 Oct 2025
Contributed by Lukas
This March 27, 2025 Anthropic paper provides an overview and detailed excerpts from two related Anth...
Latent Constituency in Humans and LLMs
26 Oct 2025
Contributed by Lukas
The provided text is an academic paper titled **"Active Use of Latent Constituency Representation in...
Cognitive Impact of AI and Search on Essay Writing
26 Oct 2025
Contributed by Lukas
The June 2025 paper presents excerpts from a study examining the **cognitive and performance differe...
LFM2-8B-A1B: Efficient On-Device Mixture-of-Experts
26 Oct 2025
Contributed by Lukas
The October 7, 2025 technical release by Liquid AI introducing their new model, **LFM2-8B-A1B**, an ...
MASA: Meta-Awareness via Self-Alignment Reinforcement Learning
26 Oct 2025
Contributed by Lukas
The September 26, 2025 paper introduces a novel reinforcement learning framework called **Meta-Aware...
LLMs Learning from Verbal Feedback Without Scalar Rewards
26 Oct 2025
Contributed by Lukas
The September 25, 2025 collaboration between Sea AI Lab, SUTD, NUS, NTU and University of Waterloo p...
Lp-Reg: Low-Probability Tokens Sustain RL Exploration
26 Oct 2025
Contributed by Lukas
The October 3, 2025 paper by Tencent introduces a reinforcement learning technique called **Low-prob...
REFRAG: v2 paper: Efficient RAG Decoding via Context Compression
22 Oct 2025
Contributed by Lukas
The Meta Superintelligence Labs team in collaboration with Rice University and National University o...
RoBERTa: Robustly Optimized BERT Pretraining Approach
22 Oct 2025
Contributed by Lukas
The July 2019 paper introduces **RoBERTa**, a **robustly optimized BERT pretraining approach**, whic...
LightMem: Lightweight Efficient Memory-Augmented Generation
22 Oct 2025
Contributed by Lukas
The October 21, 2025 academic paper introduces **LightMem**, a novel and efficient memory-augmented ...
RAG-Anything: Unified Multimodal Knowledge Retrieval Framework
22 Oct 2025
Contributed by Lukas
The October 14, 2025 paper introduxes **RAG-Anything**, a novel and unified framework for **Retrieva...
Elastic-Cache: Adaptive KV Caching for Diffusion LLMs
22 Oct 2025
Contributed by Lukas
The October 16, 2025 academic paper introduces **Elastic-Cache**, an innovative, training-free strat...
LLM-Guided Hierarchical Retrieval: The LATTICE Framework
22 Oct 2025
Contributed by Lukas
The October 15, 2025 paper details a novel information retrieval framework called **LATTICE**, which...
In-Context Learning as Implicit Learning Algorithms
22 Oct 2025
Contributed by Lukas
The May 17, 2023 academic paper explores the nature of **in-context learning (ICL)** in neural seque...
Dr.LLM: Dynamic Layer Routing in LLMs
22 Oct 2025
Contributed by Lukas
The October 14, 2025 paper is an excerpt from a research paper introducing **Dr.LLM**, a novel, retr...
A Psychometric Framework for Artificial General Intelligence
22 Oct 2025
Contributed by Lukas
This large collaboration between 29 different institutions proposes a quantifiable framework for def...
EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm
22 Oct 2025
Contributed by Lukas
The October 12, 2025 paper introduces **EssenceBench**, a novel methodology for **compressing large ...
Inheritune: Efficient LLM Training via Attention Collapse
22 Oct 2025
Contributed by Lukas
This June 8, 2025 collaboration between University of Texas and NYU paper describes a newly identifi...
Structural Understanding of LLM Overthinking
22 Oct 2025
Contributed by Lukas
The October 10, 2025 academic paper from Google DeepMind and the University of Michigan investigates...
Geometric Flows of Logic in LLM Representation Space
18 Oct 2025
Contributed by Lukas
The October 10, 2025 Duke University academic paper introduces a **novel geometric framework** that ...
Mojo: Performance-Portable HPC Kernels on GPUs
18 Oct 2025
Contributed by Lukas
The September 25 2025 academic paper **evaluates the performance and portability** of the novel **Mo...
Scaling Reinforcement Learning Compute for LLMs
17 Oct 2025
Contributed by Lukas
This October 15, 2025 collaboration between Meta, UT Austin, UCL, UC Berkeley, Harvard University, a...