AI: post transformers
Episodes
PageANN: Scalable Disk ANNS with Page-Aligned Graphs
07 Dec 2025
Contributed by Lukas
The research paper presents PageANN, a novel framework engineered to overcome the severe latency and scalability limitations facing existing **disk-ba...
NeurIPS 2025: Homogeneous Keys, Heterogeneous Values
04 Dec 2025
Contributed by Lukas
This research presents a novel method for efficient long-context modeling in Large Language Models (LLMs) by tackling the quadratic complexity of atte...
NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
29 Nov 2025
Contributed by Lukas
The research systematically investigates the effects of integrating various gating mechanisms into the standard softmax attention layer, comparing ove...
NeurIPS 2025: Large Language Diffusion Models
29 Nov 2025
Contributed by Lukas
This research paper introduces LLaDA, an 8-billion parameter language model based on the masked diffusion model (MDM) architecture, specifically devel...
NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example
29 Nov 2025
Contributed by Lukas
This research examines the data efficiency of Reinforcement Learning with Verifiable Reward (RLVR) when applied to large language models for mathemati...
NeurIPS 2025: Parallel Scaling Law for Language Models
29 Nov 2025
Contributed by Lukas
The research proposes Parallel Scaling (PARSCALE) as a novel, efficient strategy to enhance Large Language Model (LLM) capacity by increasing parallel...
NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
29 Nov 2025
Contributed by Lukas
The academic paper introduces Self-play Reinforcement Learning (SeRL), a framework engineered to enhance the reasoning capabilities of Large Language ...
NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces
29 Nov 2025
Contributed by Lukas
The provided text outlines DYNAACT, a new framework intended to enhance sequential reasoning in Large Language Models (LLMs) by dynamically managing t...
NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
29 Nov 2025
Contributed by Lukas
The academic paper introduces KGGen, a novel text-to-knowledge-graph generator designed to overcome the scarcity and poor quality of automatically ext...
NeurIPS 2025: Self-Adapting Language Models
29 Nov 2025
Contributed by Lukas
The academic paper presents the Self-Adapting LLM (SEAL) framework, designed to allow large language models to overcome their static nature by transfo...
NeurIPS 2025: Thinkless: LLM Learns When to Think
29 Nov 2025
Contributed by Lukas
The research introduces Thinkless, a framework designed to solve the computational inefficiency of Large Language Models (LLMs) that overuse chain-of-...
NeurIPS 2025: FlashBias: Fast Computation of Attention with Bias
29 Nov 2025
Contributed by Lukas
The source introduces FlashBias, an innovative algorithm designed to significantly accelerate the efficiency of the Transformer attention mechanism wh...
NeurIPS 2025: A-Mem: Agentic Memory for LLM Agents
29 Nov 2025
Contributed by Lukas
The source details the creation and evaluation of Agentic Memory (A-MEM), a novel memory system for Large Language Model (LLM) agents that addresses t...
NeurIPS 2025: MoBA: Mixture of Block Attention for Long-Context LLMs
29 Nov 2025
Contributed by Lukas
This paper introduces Mixture of Block Attention (MoBA) to address the prohibitive quadratic computational overhead inherent in traditional attention ...
NeurIPS 2025: Reward Reasoning Model
29 Nov 2025
Contributed by Lukas
The source details the development and evaluation of Reward Reasoning Models (RRMs), which are designed to enhance Large Language Model (LLM) alignmen...
Anthropic: Disrupting the First AI-Orchestrated Cyber Espionage Campaign
27 Nov 2025
Contributed by Lukas
Anthropic released a detailed report outlining the detection and disruption of an advanced cyber espionage campaign identified in late 2025, which the...
Anthropic: reward hacking & misalignment & sabotage
22 Nov 2025
Contributed by Lukas
Anthropicโs research details how **realistic AI training processes can inadvertently create misaligned models** through a mechanism called "reward h...
DeepSeek-OCR: Contexts Optical Compression
22 Nov 2025
Contributed by Lukas
The October 21, 2025 Deepseek paper introduces **DeepSeek-OCR**, a Vision-Language Model (VLM) designed to investigate the feasibility of **contexts o...
Neuromorphic computing: Brain-Inspired AI and Hardware
22 Nov 2025
Contributed by Lukas
These sources provide a comprehensive overview of **neuromorphic computing (NC)**, focusing heavily on specialized hardware and advanced Spiking Neura...
Meta: SAM 3
20 Nov 2025
Contributed by Lukas
This Meta November 18 2025 paper details the development, training, and evaluation of **Segment Anything Model 3 (SAM 3)**, a promptable segmentation ...
Mamba-360: State Space Models for Long Sequence Modeling
19 Nov 2025
Contributed by Lukas
The April 24, 2024 paper provides a comprehensive **survey of State Space Models (SSMs)**, outlining their evolution, fundamental mathematical princip...
Mixture-of-Depths: Dynamic Compute Allocation in Transformers
19 Nov 2025
Contributed by Lukas
These April 4, 2024 Google Deepmind paper introduces the **Mixture-of-Depths (MoD)** transformer architecture, a method that improves efficiency by le...
MLP Mixer Models
19 Nov 2025
Contributed by Lukas
These sources collectively explore the **MLP-Mixer architecture** and its numerous extensions across computer vision and audio tasks. The core concept...
Marin: Open LLM Optimization & Diagnostics
19 Nov 2025
Contributed by Lukas
Marin is an open lab dedicated to the transparent research and development of foundation models (FMs), focusing its core mission on identifying **how ...
vAttention Vs Strata: advanced GPU memory management
19 Nov 2025
Contributed by Lukas
We compare and contrast two advanced 2025 memory management and scheduling techniques for optimizing Large Language Model (LLM) serving throughput and...
AMD: Instella: Fully Open Language Models with Stellar Performance
16 Nov 2025
Contributed by Lukas
The November 13, 2025 paper by AMD introducs **Instella**, a new family of **fully open-source** three-billion-parameter large language models (LLMs) ...
Mechanistic interpretability: Decoding the AI's Inner Logic: Circuits and Sparse Features
15 Nov 2025
Contributed by Lukas
Ten different sources are used in this episode which are excerpts from academic papers and technical reports focusing on mechanistic interpretability ...
Spectral Gap: Analysis of Attention Layers and Graph Transformers
10 Nov 2025
Contributed by Lukas
We review two papers on Spectral Gap, one 2021 and another from 2025. The first source presents the **Spectral Attention Network (SAN)**, a novel Tran...
CARTRIDGE: Efficient In-Context Learning via Distillation
10 Nov 2025
Contributed by Lukas
The June 13, 2025 joint collaboration between Stanford University, Caltech and University at Buffalo introduces a novel method called **CARTRIDGE** fo...
Metacognition and Skill Discovery in LLM Math Reasoning
10 Nov 2025
Contributed by Lukas
The May 20, 2024 academic paper explores the **metacognitive capabilities of Large Language Models (LLMs)**, specifically focusing on mathematical pro...
Context Distillation for Language Models
10 Nov 2025
Contributed by Lukas
These five papers from 2022 up to 2025 discuss various **knowledge distillation techniques** aimed at transferring the capabilities of large language ...
Tempo: SLO-Aware LLM Serving Maximizing Service Gain
10 Nov 2025
Contributed by Lukas
The April 24, 2025 academic paper introduces **Tempo**, a novel scheduling system designed to optimize Large Language Model (LLM) serving by addressin...
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow
10 Nov 2025
Contributed by Lukas
The January 30, 2025 paper introduces **LLM-AutoDiff**, a novel framework for **Automatic Prompt Engineering (APE)** that allows for the optimization ...
Confucius: Intent-Driven Network Management with Multi-Agent LLMs
10 Nov 2025
Contributed by Lukas
The August 27, 2025 paper introduces **Confucius**, a novel multi-agent Large Language Model (LLM) framework developed by Meta for **intent-driven net...
SYMPHONY: Memory Management for LLM Multi-Turn Inference
10 Nov 2025
Contributed by Lukas
The 2024 paper introduces **SYMPHONY**, a novel system designed to improve memory management and scheduling for **Large Language Model (LLM) inference...
DSPy and TextGrad: Compiling Language Model Systems
10 Nov 2025
Contributed by Lukas
These two academic papers introduce novel programming models aimed at systematically optimizing complex AI systems, particularly those built using Lar...
Vidur: Simulation for Efficient LLM Inference Deployment
10 Nov 2025
Contributed by Lukas
The May 21, 2024 paper introduces **Vidur**, a new, high-fidelity simulation framework designed to optimize the deployment and performance of Large La...
Continuous Autoregressive Language Models: CALM
10 Nov 2025
Contributed by Lukas
The October 31, 2025 paper introduces **Continuous Autoregressive Language Models (CALM)**, a new paradigm designed to overcome the efficiency bottlen...
A Framework for LLM Application Safety Evaluation
10 Nov 2025
Contributed by Lukas
The July 13, 2025 paper " Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications" introduces a practical **fra...
Doubly Stochastic Attention for Transformers
10 Nov 2025
Contributed by Lukas
The four papers we review dated from 1967 up to two papers in 2025 collectively discuss the mathematical properties and deep learning applications of ...
Random Walk Methods for Graph Learning and Networks
10 Nov 2025
Contributed by Lukas
We provide a review of the evolution of value of Page Rank to Random Walk with Random Restart and it's application to neural networks focusing on five...
AlphaEvolve: Mathematical Discovery at Scale
10 Nov 2025
Contributed by Lukas
The November 3, 2025 paper provide an overview of the **AlphaEvolve** system, an AI-powered evolutionary approach for mathematical exploration and dis...
AdaFlow: Variance-Adaptive Flow-Based Imitation Learning
10 Nov 2025
Contributed by Lukas
The November 22, 2024 paper from UT Texas introduces **AdaFlow**, a novel imitation learning framework designed to improve both the efficiency and div...
zFLoRA: Zero-Latency Fused Low-Rank Adapters
04 Nov 2025
Contributed by Lukas
The October 28, 2025 Samsung research paper introduces **zFLoRA (zero-latency fused low-rank adapter)**, a novel parameter-efficient fine-tuning (PEFT...
SuperBPE: Space Travel for Language Models
04 Nov 2025
Contributed by Lukas
The August 26, 2025 collaboration between the University of Washington, NVIDIA and the Allen Institute for AI paper introduces **"SuperBPE: Space Trav...
Google: Supervised Reinforcement Learning for Step-wise Reasoning in LLMs
04 Nov 2025
Contributed by Lukas
The October 29 2025 Google research paper introduces **Supervised Reinforcement Learning (SRL)**, a novel framework designed to improve the complex, m...
MorphKV: Constant-Sized KV Caches for LLM Inference
04 Nov 2025
Contributed by Lukas
The June 7, 2025 UT Austin and University of British Colombia collaboration academic paper introduces **MorphKV**, a novel inference-time technique de...
HALoS: Hierarchical Asynchronous LLM Training over Slow Networks
04 Nov 2025
Contributed by Lukas
The June 5, 2025 research paper introducing **HALoS: Hierarchical Asynchronous Local SGD**, a novel optimization framework designed for training large...
Anchored Diffusion Language Model: Superior Generation and Reasoning
04 Nov 2025
Contributed by Lukas
The May 24, 2025 UT Austin paper introduces the **Anchored Diffusion Language Model (ADLM)**, a novel approach that aims to improve discrete language ...
Gumbel-Softmax for Differentiable Categorical Reparameterization and Selective Networks
04 Nov 2025
Contributed by Lukas
These two papers (years 2017, 2022) introduce and then apply the **Gumbel-Softmax distribution** as a differentiable gradient estimator for **categori...
PolicySmith: Automated Systems Heuristic Generation via LLMs
04 Nov 2025
Contributed by Lukas
The October 9, 2025 paper from UT Austin paper introduces **PolicySmith**, a novel framework that automates the design of system policies, arguing tha...
RetNet: Retentive Networks: Transformer Successor for Large Language Models
02 Nov 2025
Contributed by Lukas
The August 9, 2023 paper introduces the **Retentive Network (RetNet)**, a proposed foundational architecture for large language models intended to suc...
Kimi Linear: Efficient Expressive Attention Architecture
02 Nov 2025
Contributed by Lukas
The October 30, 2025 **technical report** details the development and evaluation of **Kimi Linear**, a novel **hybrid linear attention architecture** ...
ALiBi: Attention with Linear Biases Enables Length Extrapolation
01 Nov 2025
Contributed by Lukas
The April 22, 2022 collaboration between University of Washington, Facebook AI and the Allen Institute for AI introduces Attention with Linear Biases ...
Quest: Query-Aware Sparsity for Efficient LLM Inference
31 Oct 2025
Contributed by Lukas
The August 26, 2024 academic paper introduces **Quest**, a novel algorithm designed to improve the inference efficiency of **Long-Context Large Langua...
Flash-LLM: Efficient LLM Inference with Unstructured Sparsity on Tensor Cores
31 Oct 2025
Contributed by Lukas
The September 19, 2025 Alibaba paper introduces **Flash-LLM**, a novel software framework designed to enable **cost-effective and highly-efficient inf...
ELASTIC: Linear Attention for Sequential Interest Compression
31 Oct 2025
Contributed by Lukas
The February 12, 2025 KuaiShou Inc paper introduces **ELASTIC**, an Efficient Linear Attention for SequenTial Interest Compression framework designed ...
Anthropic: Introspective Awareness in LLMs
31 Oct 2025
Contributed by Lukas
On October 29, 2025 Anthropic presented research investigating the existence of **functional introspective awareness** in large language models (LLMs)...
Small Versus Large Models for Requirements Classification
31 Oct 2025
Contributed by Lukas
The October 24, 2025 collaboration between many universities have published a paper thst compares the performance of **Large Language Models (LLMs)** ...
Hyper-Scaling LLM Inference with KV Cache Compression
31 Oct 2025
Contributed by Lukas
The June 5, 2025 collaboration between University of Edinburgh and Nvidia paper introduces the concept of **inference-time hyper-scaling** for large l...
Architectural Scaling Laws for Efficient LLMs
31 Oct 2025
Contributed by Lukas
The October 21, 2025 collaboration paper between UW-Madison and Amazon Web Services discuss the critical role of the **Multi-Layer Perceptron (MLP) in...
ATTENTION2D and lean attention: Distributed Self-Attention
29 Oct 2025
Contributed by Lukas
We cover two new innovations from Microsoft extending ideas from the original old **FlashAttention**. Flash Attention is an IO-aware attention algorit...
Sentence-BERT: Siamese Networks for Sentence Embeddings
29 Oct 2025
Contributed by Lukas
The provided text introduces **Sentence-BERT (SBERT)**, a modification of the popular **BERT** and **RoBERTa** language models, designed to efficientl...
TxGNN: Foundation Model for Zero-Shot Drug Repurposing
29 Oct 2025
Contributed by Lukas
The source provides excerpts from a scientific paper introducing **TxGNN**, a novel graph foundation model designed for **zero-shot drug repurposing**...
STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency
26 Oct 2025
Contributed by Lukas
These April 29, 2024 paper provides an overview of the challenges associated with using **NVIDIA's Multi-Instance GPU (MIG)** technology, specifically...
Strata: Efficient Hierarchical Context Caching for LLM Serving
26 Oct 2025
Contributed by Lukas
The August 26, 2025 collaboration between Stanford, NVIDIA, Shanghai Jiao Tong University, University of Michigan, University of Colorado Boulder, Car...
FlashAttention: IO-Aware Fast and Memory-Efficient Attention
26 Oct 2025
Contributed by Lukas
This is a classic review of a now old but yet still important paper, the original Flash Attention paper. We review this in light of advances in compil...
Introducing MTEB v2: Multimodal Embedding Evaluation
26 Oct 2025
Contributed by Lukas
On October 20, 2025 Hugging Face released **MTEB v2**, a significant refactoring of the Massive Text Embedding Benchmark, which was originally designe...
Structural Understanding of LLM Overthinking
26 Oct 2025
Contributed by Lukas
The October 10, 2025 paper from the University of Michigan and **Google DeepMind** concerning the phenomenon of **"overthinking" in Large Language Mod...
Stuck in the Matrix: LLM Spatial Reasoning
26 Oct 2025
Contributed by Lukas
The October 23 2025 research paper **probes the spatial reasoning capabilities of Large Language Models (LLMs) when processing text-based inputs**, sp...
LLM-Empowered Knowledge Graph Construction: A Survey
26 Oct 2025
Contributed by Lukas
This October 23, 2025 Xidian University academic survey systematically reviews the transformative impact of **Large Language Models (LLMs)** on the th...
Survey of Emerging Topics in AI and Robotics
26 Oct 2025
Contributed by Lukas
The October 23, 2025 collaboration between UC San Diego , NVIDIA , META , UW-Madison , and UNC introduces **Real Deep Research (RDR)**, a systematic f...
The Free Transformer: VAE Extension for Decoders
26 Oct 2025
Contributed by Lukas
The October 20, 2025 Meta FAIR paper introduces the **Free Transformer**, an innovative extension of the decoder-only Transformer architecture, which ...
LithOS: Operating System for Efficient GPU Machine Learning
26 Oct 2025
Contributed by Lukas
This 2025 CMU paper introduces **LithOS**, a novel operating system designed to improve the efficiency and utilization of Graphics Processing Units (G...
Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning
26 Oct 2025
Contributed by Lukas
This October 23, 2025 technical report from the Ling Team introduces the **Ring-linear model series**, specifically Ring-mini-linear-2.0 and Ring-flas...
GigaBrain-0: World Model-Powered Generalist Robots
26 Oct 2025
Contributed by Lukas
The October 22, 2025 GigaAI paper introduces **GigaBrain-0**, a novel Vision-Language-Action (VLA) model designed for general-purpose robotic systems,...
Open-o3 Video: Spatio-Temporal Grounded Reasoning
26 Oct 2025
Contributed by Lukas
The October 25, 2025 Bytedance paper introduces **Open-o3 Video**, a novel framework developed by researchers from **Peking University** and **ByteDan...
CattellโHornโCarroll Theory of Intelligence
26 Oct 2025
Contributed by Lukas
We review the Cattell-Horn-Carroll (CHC) used in recent AI papers on the definition of what AGI could be. The provided sources offer a comprehensive o...
Internal Mechanisms of a Large Language Model
26 Oct 2025
Contributed by Lukas
This March 27, 2025 Anthropic paper provides an overview and detailed excerpts from two related Anthropic papers concerning the **interpretability of ...
Latent Constituency in Humans and LLMs
26 Oct 2025
Contributed by Lukas
The provided text is an academic paper titled **"Active Use of Latent Constituency Representation in both Humans and Large Language Models,"** which e...
Cognitive Impact of AI and Search on Essay Writing
26 Oct 2025
Contributed by Lukas
The June 2025 paper presents excerpts from a study examining the **cognitive and performance differences** in essay writing among participants using a...
LFM2-8B-A1B: Efficient On-Device Mixture-of-Experts
26 Oct 2025
Contributed by Lukas
The October 7, 2025 technical release by Liquid AI introducing their new model, **LFM2-8B-A1B**, an **on-device Mixture-of-Experts (MoE)** designed fo...
MASA: Meta-Awareness via Self-Alignment Reinforcement Learning
26 Oct 2025
Contributed by Lukas
The September 26, 2025 paper introduces a novel reinforcement learning framework called **Meta-Awareness via Self-Alignment (MASA)**, designed to enha...
LLMs Learning from Verbal Feedback Without Scalar Rewards
26 Oct 2025
Contributed by Lukas
The September 25, 2025 collaboration between Sea AI Lab, SUTD, NUS, NTU and University of Waterloo paper proposes an alternative to traditional Reinfo...
Lp-Reg: Low-Probability Tokens Sustain RL Exploration
26 Oct 2025
Contributed by Lukas
The October 3, 2025 paper by Tencent introduces a reinforcement learning technique called **Low-probability Regularization (Lp-Reg)** designed to over...
REFRAG: v2 paper: Efficient RAG Decoding via Context Compression
22 Oct 2025
Contributed by Lukas
The Meta Superintelligence Labs team in collaboration with Rice University and National University of Singapore have followed up with a version 2 of t...
RoBERTa: Robustly Optimized BERT Pretraining Approach
22 Oct 2025
Contributed by Lukas
The July 2019 paper introduces **RoBERTa**, a **robustly optimized BERT pretraining approach**, which is a refined version of the original BERT model....
LightMem: Lightweight Efficient Memory-Augmented Generation
22 Oct 2025
Contributed by Lukas
The October 21, 2025 academic paper introduces **LightMem**, a novel and efficient memory-augmented generation framework designed to enhance Large Lan...
RAG-Anything: Unified Multimodal Knowledge Retrieval Framework
22 Oct 2025
Contributed by Lukas
The October 14, 2025 paper introduxes **RAG-Anything**, a novel and unified framework for **Retrieval-Augmented Generation (RAG)** designed to overcom...
Elastic-Cache: Adaptive KV Caching for Diffusion LLMs
22 Oct 2025
Contributed by Lukas
The October 16, 2025 academic paper introduces **Elastic-Cache**, an innovative, training-free strategy designed to significantly accelerate the infer...
LLM-Guided Hierarchical Retrieval: The LATTICE Framework
22 Oct 2025
Contributed by Lukas
The October 15, 2025 paper details a novel information retrieval framework called **LATTICE**, which uses a Large Language Model (LLM) to perform **hi...
In-Context Learning as Implicit Learning Algorithms
22 Oct 2025
Contributed by Lukas
The May 17, 2023 academic paper explores the nature of **in-context learning (ICL)** in neural sequence models, particularly transformers, by investig...
Dr.LLM: Dynamic Layer Routing in LLMs
22 Oct 2025
Contributed by Lukas
The October 14, 2025 paper is an excerpt from a research paper introducing **Dr.LLM**, a novel, retrofittable framework designed to improve the effici...
A Psychometric Framework for Artificial General Intelligence
22 Oct 2025
Contributed by Lukas
This large collaboration between 29 different institutions proposes a quantifiable framework for defining **Artificial General Intelligence (AGI)**, c...
EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm
22 Oct 2025
Contributed by Lukas
The October 12, 2025 paper introduces **EssenceBench**, a novel methodology for **compressing large language model (LLM) benchmarks** while preserving...
Inheritune: Efficient LLM Training via Attention Collapse
22 Oct 2025
Contributed by Lukas
This June 8, 2025 collaboration between University of Texas and NYU paper describes a newly identified structural inefficiency in Large Language Model...
Structural Understanding of LLM Overthinking
22 Oct 2025
Contributed by Lukas
The October 10, 2025 academic paper from Google DeepMind and the University of Michigan investigates **"overthinking" in large language models (LLMs)*...
Geometric Flows of Logic in LLM Representation Space
18 Oct 2025
Contributed by Lukas
The October 10, 2025 Duke University academic paper introduces a **novel geometric framework** that views Large Language Model (LLM) reasoning as cont...
Mojo: Performance-Portable HPC Kernels on GPUs
18 Oct 2025
Contributed by Lukas
The September 25 2025 academic paper **evaluates the performance and portability** of the novel **Mojo programming language** for high-performance com...
Scaling Reinforcement Learning Compute for LLMs
17 Oct 2025
Contributed by Lukas
This October 15, 2025 collaboration between Meta, UT Austin, UCL, UC Berkeley, Harvard University, and Periodic Labs details a systematic study on sca...