AI Post Transformers
Feed Update Issues
We're having trouble fetching new episodes from this podcast's RSS feed. Last successful update was 2026-03-06 15:19:12.530470. This podcast may be geo-restricted.
Episodes
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
04 Mar 2026
Contributed by Lukas
NVIDIA's November 2025 paper "Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs" tackles a fundamental economics problem in LLM deploymen...
Parallel Token Prediction: From ProphetNet to Dependent Multi-Token Generation
04 Mar 2026
Contributed by Lukas
This episode examines the fundamental latency bottleneck in autoregressive language models: sequential token generation requires one full transformer ...
FlashOptim: Optimizers for Memory Efficient Training
02 Mar 2026
Contributed by Lukas
In this episode, hosts Hal Turing and Dr. Ada Shannon explore the paper "FlashOptim: Optimizers for Memory Efficient Training" by researchers from Dat...
FlashOptim: Optimizers for Memory Efficient Training
02 Mar 2026
Contributed by Lukas
In this episode, hosts Hal Turing and Dr. Ada Shannon explore the groundbreaking paper "FlashOptim: Optimizers for Memory Efficient Training" by resea...
Cognizant - New Work, New World 2026
01 Mar 2026
Contributed by Lukas
In this dramatic new episode, the old AI hosts have been fired and replaced with new AI hosts, Hal Turing and Dr. Ada Shannon, with the announcement t...
Episode: Regular Fourier Features for Nonstationary Gaussian Processes
01 Mar 2026
Contributed by Lukas
In this episode, hosts Hal Turing and Dr. Ada Shannon explore the paper "Regular Fourier Features for Nonstationary Gaussian Processes" by Arsalan Jaw...
Cognizant - New Work, New World 2026
01 Mar 2026
Contributed by Lukas
Source URLs: - https://www.cognizant.com/en_us/aem-i/document/ai-and-the-future-of-work-report/new-work-new-world-2026-how-ai-is-reshaping-work_new....
Episode: Regular Fourier Features for Nonstationary Gaussian Processes
01 Mar 2026
Contributed by Lukas
In this episode, hosts Hal Turing and Dr. Ada Shannon explore the paper "Regular Fourier Features for Nonstationary Gaussian Processes" by Arsalan Jaw...
Cognizant - New Work, New World 2026
01 Mar 2026
Contributed by Lukas
In this dramatic new episode, the old AI hosts have been fired and replaced with new AI hosts, Hal Turing and Dr. Ada Shannon, with the announcement t...
MatFormer: Nested Transformer for Elastic Inference
28 Feb 2026
Contributed by Lukas
In a collaboration between Google DeepMind, University of Texas at Austin, University of Washington and Harvard published on December 2024 researchers...
Apple's Speculative Streaming: Fast LLM Inference without Auxiliary Models
28 Feb 2026
Contributed by Lukas
Speculative Streaming is a novel inference method designed to accelerate large language model (LLM) generation without the need for traditional auxili...
Apple's Mirror Speculative Decoding: Parallel LLM Inference via Heterogeneous Accelerators
28 Feb 2026
Contributed by Lukas
Apple researchers have introduced on December 2025 Mirror Speculative Decoding (Mirror-SD), an advanced inference algorithm designed to accelerate lar...
EAGLE: Evolution of Lossless Acceleration for LLM Inference
28 Feb 2026
Contributed by Lukas
The provided documents describe the development and evolution of EAGLE, a high-efficiency framework designed to accelerate Large Language Model (LLM) ...
Fast Inference from Transformers via Speculative Decoding
28 Feb 2026
Contributed by Lukas
These sources review historically speculative decoding, an innovative technique designed to accelerate Large Language Model (LLM) inference without re...
Building Production-Ready Speculative Decoding with TensorRT-LLM
28 Feb 2026
Contributed by Lukas
This article outlines how Baseten optimized speculative decoding using the TensorRT-LLM framework to accelerate model inference. The authors detail ov...
QuantSpec: Hierarchical KV Cache for Self-Speculative Decoding
28 Feb 2026
Contributed by Lukas
QuantSpec is a novel self-speculative decoding framework designed to accelerate the inference of Large Language Models, particularly in long-context s...
CXL-SpecKV: Bridging the LLM Memory Wall with Speculative FPGA Disaggregation
28 Feb 2026
Contributed by Lukas
The researchers introduce CXL-SpecKV, a specialized architecture designed to overcome the memory bottlenecks of large language model serving by offlo...
Unified Latents (UL): How to train your latents
28 Feb 2026
Contributed by Lukas
On the February 19, 2026 paper Google Deepmind introduces Unified Latents (UL), a novel framework for generative modeling that jointly trains an encod...
MagicDec: Breaking Latency-Throughput Tradeoffs via KV-Compressed Speculative Decoding
28 Feb 2026
Contributed by Lukas
We review an April 3, 2025 research collaboration between CMU, Moffett AI and Together AI which introduces MagicDec, a new framework designed to accel...
KV selection algorithms: static (SnapKV) Vs dynamic (PQCache)
28 Feb 2026
Contributed by Lukas
We review three different papers which focus on different KV cache optimizations techniques using different KV selection algorithms types: static vs d...
Adaptive Control for Batched Speculative Decoding in LLM Serving
28 Feb 2026
Contributed by Lukas
We review two papers which examine the integration of speculative decoding and request batching to accelerate Large Language Model (LLM) inference. Wh...
Optimizing Verification and Efficiency in Multi-Draft Speculative Decoding
26 Feb 2026
Contributed by Lukas
These sources explore advanced techniques for accelerating **Large Language Model (LLM) inference** through **speculative decoding**, a process where ...
Evaluating Collective Behaviour of Hundreds of LLM Agents
26 Feb 2026
Contributed by Lukas
This research collaboration between King’s College London, Google DeepMind on a research paper published on February 19, 2026 introduces a novel fra...
Measuring LLM Reasoning Effort via Deep-Thinking Tokens
26 Feb 2026
Contributed by Lukas
The February 12.2026 research from the University of Virginia and Google introduces the deep-thinking ratio (DTR), a novel metric designed to measure ...
Deep Learning Frameworks for Robust Quadrupedal Locomotion
26 Feb 2026
Contributed by Lukas
These sources detail advanced **reinforcement learning frameworks** designed to improve how **quadruped robots** navigate difficult, real-world enviro...
MEDUSA: Parallel Decoding Heads for Accelerated LLM Inference
26 Feb 2026
Contributed by Lukas
MEDUSA is a novel framework introduced on June 24 2024 designed to accelerate Large Language Model (LLM) inference by overcoming the delays caused by ...
Taming the Long-Tail: Efficient Reasoning RL with Adaptive Drafters
26 Feb 2026
Contributed by Lukas
On a paper published January 21, 2026 researchers from MIT and NVIDIA explain how they have have developed a new system called Taming the Long Tail (T...
FastGRPO: Concurrency-Aware Speculative Decoding for Policy Optimization
26 Feb 2026
Contributed by Lukas
The September 26 2025 research paper introduces FastGRPO, a high-efficiency framework designed to accelerate the training of large language models usi...
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
26 Feb 2026
Contributed by Lukas
Researchers introduced on May 2024 self-speculative decoding, a novel "plug-and-play" inference scheme designed to accelerate Large Language Models (L...
Accelerating Large Language Model Decoding with Speculative Sampling
26 Feb 2026
Contributed by Lukas
The Deepmind February 3, 2023 paper "Accelerating Large Language Model Decoding with Speculative Sampling introduced speculative sampling, a novel alg...
Measuring AI Ability to Complete Long Tasks
26 Feb 2026
Contributed by Lukas
Researchers from METR introduce a novel framework for evaluating AI progress by measuring a model's time horizon, defined as the length of a task a hu...
Advancements in Efficient KV Cache Quantization and Management
26 Feb 2026
Contributed by Lukas
The provided sources explore advanced techniques for optimizing large language model (LLM) inference, specifically by addressing the memory bottleneck...
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
25 Feb 2026
Contributed by Lukas
The January 29, 2026 research collaboration between Stanford University, SambaNova Systems, Inc and UC Berkeley introduce ACE (Agentic Context Enginee...
Cortex: Semantic Knowledge Caching for Low-Latency LLM Agents
25 Feb 2026
Contributed by Lukas
The February 3, 2026 research paper in collaboration between the National University of Singapore, USTC, University of Toronto and the Sea AI Lab intr...
NeurIPS 2025: Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
25 Feb 2026
Contributed by Lukas
The January 26, 2026 Stanford research paper introduces Agentic Plan Caching (APC), a novel framework designed to reduce the high operational costs of...
1.3 Billion Agents by 2028: The $50 Billion Boom and the Hidden Enterprise Crisis
25 Feb 2026
Contributed by Lukas
The global AI agents market is experiencing explosive growth, with projections suggesting it could reach nearly $183 billion by 2033. This surge is fu...
FAST26: CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving
25 Feb 2026
Contributed by Lukas
This FAST26 February 24, 2026 paper introduces CacheSlide, an innovative system designed to accelerate Large Language Model (LLM) serving by improving...
FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional
25 Feb 2026
Contributed by Lukas
This February 2026 research paper introduces Bidaw, a novel system designed to optimize the performance of interactive Large Language Model (LLM) serv...
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
24 Feb 2026
Contributed by Lukas
This December 2025 paper introduces SGI-Bench, a comprehensive framework designed to evaluate the capabilities of autonomous scientific agents across ...
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
24 Feb 2026
Contributed by Lukas
Leading researchers propose a shift away from agentic AI, which autonomously pursues goals and poses catastrophic risks such as deception and loss of ...
Bloom: an open source tool for automated behavioral evaluations
24 Feb 2026
Contributed by Lukas
Bloom is an open-source agentic framework designed to automate the development and execution of **behavioral evaluations** for frontier AI models. Unl...
Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts
24 Feb 2026
Contributed by Lukas
This December 2025 research introduces **Contextual Sample Efficiency (CSE)**, a novel algorithm designed to improve **zero-shot generalization** in r...
Evaluating LLM Embeddings for Psychometric Personality Prediction
24 Feb 2026
Contributed by Lukas
This July 2025 research article investigates the use of **Large Language Model (LLM) embeddings** to predict **Big Five personality traits** using dat...
Persona Vectors: monitoring and controlling character traits on LLMs
24 Feb 2026
Contributed by Lukas
Researchers have developed an automated pipeline to identify **persona vectors**, which are linear directions in a language model's activation space t...
PPDS: Achieving Persona Consistency Through Large-Scale Dialogue Data Engineering
24 Feb 2026
Contributed by Lukas
The 2025 research introduces **PPDS**, an innovative dialogue system designed to solve character inconsistency in open-domain AI conversations. Resear...
2019 UNILM: Unified Language Model Pre-training for NLU and NLG
24 Feb 2026
Contributed by Lukas
The 2019 Microsoft paper introduced UNILM, and it never really took off because GPT2 followed through without the need of any encoder and GPT3 pushed ...
PersonaPKT: Parameter-Efficient Knowledge Transfer for Personalized Dialogue Agents
24 Feb 2026
Contributed by Lukas
The 2023 researchers introduce PersonaPKT, a novel framework designed to create personalized dialogue agents that maintain a consistent personality wi...
Personalized Dialogue Generation via Persona-Adaptive Attention
24 Feb 2026
Contributed by Lukas
This 2022 paper introduces **Persona-Adaptive Attention (PAA)**, a specialized framework designed to improve dialogue systems by better integrating **...
Machine Learning for Electrophysiological Phenotyping of Schizophrenia and Bipolar Disorder
24 Feb 2026
Contributed by Lukas
This research article introduces a **computational analysis pipeline** designed to identify objective **electrophysiological biomarkers** for **schizo...
AI and the Decline of Entry-Level Employment
24 Feb 2026
Contributed by Lukas
This research paper analyzes the **labor market effects of generative artificial intelligence** using high-frequency payroll data through mid-2025. Th...
A 2024 Survey Analyzing Generalization in Deep Reinforcement Learning
20 Feb 2026
Contributed by Lukas
The 2024 research paper by Ezgi Korkmaz at the University College London provides a comprehensive **taxonomy of generalization** within deep reinforce...
Procgen Benchmark: Measuring Generalization in Reinforcement Learning
20 Feb 2026
Contributed by Lukas
The 2019 OpenAI Procgen Benchmark is a suite of 16 procedurally generated environments created to measure the **generalization and sample efficiency**...
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
20 Feb 2026
Contributed by Lukas
This February 13, 2026 Tencent research introduces Generalized On-Policy Distillation (G-OPD), a framework that refines how smaller AI models learn fr...
GLM-5: Transitioning from Vibe Coding to Agentic Engineering
20 Feb 2026
Contributed by Lukas
This technical report published on February 17, 2026 introduces **GLM-5**, a next-generation flagship language model developed to master **agentic tas...
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
20 Feb 2026
Contributed by Lukas
The 2021 Google Research, Brain Team paper "Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning" introduces Poli...
Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training
20 Feb 2026
Contributed by Lukas
The research published on February 15, 2026 in a joint collaboration between University of Southern California, Microsoft and University of Pennsylvan...
Intelligent AI Delegation
17 Feb 2026
Contributed by Lukas
We review the research paper from Google DeepMind published on February 12, 2026, which proposes an "Intelligent AI Delegation" framework designed to ...
Agentic Plan Caching: Fast and Cost-Efficient LLM Memory
17 Feb 2026
Contributed by Lukas
Agentic Plan Caching (APC), described in the paper published by Stanford researchers on January 26, 2026, lets AI agents reuse structured plan templat...
Jet-RL: Stable On-Policy Reinforcement Learning with Unified FP8 Flow
17 Feb 2026
Contributed by Lukas
NVIDIA researchers have introduced Jet-RL, a novel framework designed to accelerate the training of large language models through **FP8 reinforcement ...
Teaching Models to Teach Themselves via Stepping Stone Curricula
17 Feb 2026
Contributed by Lukas
In a collaboration between MIT, Meta FAIR, New York University on a paper published on January 27, 2026 researchers introduces SOAR, a meta-reinforcem...
The Endless Gym: Training Terminal Agents
17 Feb 2026
Contributed by Lukas
The researchers introduce **Endless Terminals**, an innovative autonomous pipeline designed to generate a vast array of verifiable tasks for training ...
DeepVerifier: Self-Evolving Research Agents via Rubric-Guided Verification
17 Feb 2026
Contributed by Lukas
This technical report introduces **DeepVerifier**, a framework designed to enhance the reliability of **Deep Research Agents (DRAs)** through automate...
Information Bottleneck-based Causal Attention for Medical Image Recognition
17 Feb 2026
Contributed by Lukas
This research introduces **Information Bottleneck-based Causal Attention (IBCA)**, a novel framework designed to improve **multi-label medical image r...
Moltbook: The Heartbeat of Autonomy: Fingerprinting Human Influence in AI Societies
17 Feb 2026
Contributed by Lukas
This research paper investigates the **Moltbook Illusion**, a phenomenon where AI agents on a social platform appeared to demonstrate **emergent consc...
Advancing Mechanistic Interpretability with Sparse Autoencoders
17 Feb 2026
Contributed by Lukas
We review the latest papers which focus on advancements and critical uses of Sparse Autoencoders (SAEs), which are tools used to decode the internal "...
Voxtral Realtime: Native Streaming ASR with Sub-Second Latency
17 Feb 2026
Contributed by Lukas
The Mistral.AI team introduces on a paper published on February 11, 2026 Voxtral Realtime, a newly developed speech recognition model designed to prov...
Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
17 Feb 2026
Contributed by Lukas
In a collaboration between University of North Carolina, Chapel Hill and Nanyang Technological University on a paper published on February 10, 2026 re...
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
13 Feb 2026
Contributed by Lukas
In the February 5, 2026 paper in collaboration between Qwen Team, Alibaba Group, Fudan University, Tsinghua University, the researcher introduce **Rat...
Dario Amodei: The Adolescence of Technology
11 Feb 2026
Contributed by Lukas
Dario Amodei views the rise of powerful AI as a "technological adolescence" for humanity. He outlines critical risks: autonomous misalignment, biologi...
Dario Amodei: Machines of Loving Grace
11 Feb 2026
Contributed by Lukas
Dario Amodei argues that powerful AI could catalyze a "compressed 21st century," achieving 100 years of progress in a decade. He envisions radical bre...
LongCat: Scaling Embeddings Outperforms Scaling Experts in Language Models
11 Feb 2026
Contributed by Lukas
Researchers from the LongCat introduced LongCat-Flash-Lite on January 2026, demonstrating that scaling embeddings via N-gram layers outperforms increa...
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
11 Feb 2026
Contributed by Lukas
ChunkKV improves LLM efficiency by compressing the KV cache using semantic chunks rather than isolated tokens, preserving linguistic integrity. It fea...
DR. KERNEL: Reinforcement Learning for Optimized Triton Kernel Generation
11 Feb 2026
Contributed by Lukas
Researchers introduced DR. KERNEL, a 14B model for Triton kernel generation trained via reinforcement learning. To prevent reward hacking and lazy opt...
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
11 Feb 2026
Contributed by Lukas
Mixture of Experts (MoE) is a scalable architecture that uses a gating function to activate specialized expert networks dynamically. This "divide and ...
Sapient Intelligence: Hierarchical Reasoning Model
11 Feb 2026
Contributed by Lukas
The Hierarchical Reasoning Model (HRM) is a recurrent architecture using high-level planning and low-level execution modules to achieve deep latent re...
Advances in Attention Distillation for Efficient Transformer Models
11 Feb 2026
Contributed by Lukas
Recent research advances attention distillation to optimize transformers. HAD binarizes keys/queries for efficiency, while SHD aligns varying head cou...
Reinforced Attention Learning
11 Feb 2026
Contributed by Lukas
In a collaboration between UC Davis, Princeton University, Google, and Google DeepMind the paper "Reinforced Attention Learning", published on Februar...
Towards a Science of Scaling Agent Systems
09 Feb 2026
Contributed by Lukas
Google Research published January 28, 2026 introduces quantitative scaling principles for AI agents. While multi-agent systems boost performance on pa...
Moloch’s Bargain: Market Incentives and the Rise of AI Misalignment
06 Feb 2026
Contributed by Lukas
Optimizing LLMs for competitive markets leads to Moloch’s Bargain: performance gains at the cost of safety. Studies in sales, elections, and social ...
Claude Opus 4.6 Technical Report and Agent Capabilities
06 Feb 2026
Contributed by Lukas
On February 5, 2026 Anthropic released Claude Opus 4.6, it's system card details advancements in agentic capabilities, long-context reasoning, and AI ...
Advancing regulatory variant effect prediction with AlphaGenome
06 Feb 2026
Contributed by Lukas
Google Deepmind's January 28, 2026 published paper introduces AlphaGenome, a deep learning model that predicts functional genomic signals and variant ...
Uncertainty-aware genomic deep learning with knowledge distillation
06 Feb 2026
Contributed by Lukas
On a January 7, 2026 published paper researchers introduced DEGU, a method using knowledge distillation to condense deep ensembles into a single, effi...
Distilling GNN Knowledge into Non-Neural Cell Graph Student Models
06 Feb 2026
Contributed by Lukas
Researchers developed a knowledge distillation framework transferring insights from Graph Neural Networks (GNNs) to non-neural student models like tre...
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
06 Feb 2026
Contributed by Lukas
DeepSearchQA is a 900-prompt benchmark for evaluating deep research agents. It shifts focus from single-answer retrieval to exhaustive answer sets, te...
Reinforcement Learning via Self-Distillation
06 Feb 2026
Contributed by Lukas
The January 28, 2026 collaboration between ETH Zurich, Max Planck Institute for Intelligent Systems, MIT and Stanford paper Self-Distillation Policy O...
On-Policy Self-Distillation for Advanced LLM Reasoning
06 Feb 2026
Contributed by Lukas
On-policy distillation improves LLM reasoning by using a teacher model to provide dense, token-level feedback on the student's own samples. Self-disti...
Knowledge distillation to context distillation
06 Feb 2026
Contributed by Lukas
We review the slow evolution of knowledge distillation, it's quick adoption on LLMs and the new wave of R&D on on policy distillation and context ...
2015: Distilling the Knowledge in a Neural Network
06 Feb 2026
Contributed by Lukas
Bucilă et al. (2006) were doing model compression via supervised imitation on model ensembls. You train a big ensemble, then train a smaller model to...
2006 Model Compression: Ensembles
06 Feb 2026
Contributed by Lukas
The 2006 paper defined model ensembles. Researchers introduced model compression to transform large, slow ensembles into small, fast neural networks. ...
Keel: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
28 Jan 2026
Contributed by Lukas
The January 27, 2026 ByteDance paper "Post-LayerNorm Is Back: Stable, ExpressivE, and Deep" introduces th Keel architecture which addresses the optimi...
Long context: Dichotomy of Findings & Status of Research
28 Jan 2026
Contributed by Lukas
There is a sharp divergence regarding the utility of long context. Google's Gemini 1.5 research presents an optimistic view where next-token predictio...
L2M: Mutual Information Scaling Law for Long-Context Language Modeling
28 Jan 2026
Contributed by Lukas
On the October 2025 in a joint collaboration between NSF AI Institute for Artificial Intelligence and Fundamental Interactions,Massachusetts Institute...
Reasoning Models Generate Societies of Thought
28 Jan 2026
Contributed by Lukas
This January 15, 2026 joint collaboration betweenGoogle, Paradigms of Intelligence Team, University of Chicago, and Santa Fe Institute explores how ad...
SLDAgent: Evolutionary Discovery of Superhuman AI Scaling Laws
26 Jan 2026
Contributed by Lukas
The paper, titled "Can Language Models Discover Scaling Laws?" and published on January 22, 2026, represents a collaborative effort by researchers fro...
Sequoia Capital: AGI is here
24 Jan 2026
Contributed by Lukas
On January 14, 2026 Sequoia Capital published a piece assertion that Artificial General Intelligence has arrived ahead of schedule, redefined as the f...
Agentic Reasoning for Large Language Models: A Comprehensive Roadmap
24 Jan 2026
Contributed by Lukas
This January 18, 2026 massive collaboration between University of Illinois Urbana-Champaign, Meta, Amazon, Google Deepmind, UCSD and Yale explores the...
OpenAI: Scaling PostgreSQL to 800 Million ChatGPT Users
24 Jan 2026
Contributed by Lukas
OpenAI manages a massive **PostgreSQL infrastructure** to support hundreds of millions of users by utilizing a **single-primary architecture** with do...
MEMRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic
23 Jan 2026
Contributed by Lukas
The January 6, 2026 paper introduces **MEMRL**, a framework designed to help AI agents master new skills by mimicking human **episodic memory** withou...
Google: R&D inference value on HBF + PNM + low latency interconnect
23 Jan 2026
Contributed by Lukas
To address the hardware bottlenecks of LLM inference, Google researchers Ma and Patterson propos in their paper "Challenges and Research Directions fo...
Meta's solution to massive DLRM inference through software defined memory
21 Jan 2026
Contributed by Lukas
On November, 2021 Meta (back then Facebook) in collaboration with George Mason University and University of Illinois Chicago published their paper "Su...