Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

AI Post Transformers

Technology

Feed Update Issues

We're having trouble fetching new episodes from this podcast's RSS feed. Last successful update was 2026-03-06 15:19:12.530470. This podcast may be geo-restricted.

Episodes

Showing 101-200 of 458
«« ← Prev Page 2 of 5 Next → »»

Storage-next: Do We Need New Hardware for AI Storage, or Just Better Layouts?

21 Jan 2026

Contributed by Lukas

We review the "Storage-Next" paper, published in November 2025, which argues that a fundamental hardware architectural shift is required to elevate NA...

LeCun's AMI Energy-Based Models and the Path to Autonomous Intelligence

21 Jan 2026

Contributed by Lukas

These sources collectively explore the current landscape and future trajectory of artificial intelligence, specifically focusing on the transition tow...

MemoBrain: Executive Memory for Tool-Augmented Reasoning Agents

19 Jan 2026

Contributed by Lukas

On January 12, 2026, a collaboration between the Beijing Academy of Artificial Intelligence, the Gaoling School of Artificial Intelligence, and Renmin...

Parallel Context-of-Experts Decoding for Efficient RAG reasoning

19 Jan 2026

Contributed by Lukas

The collaboration between SAP Labs, France and EURECOM, France published a paper on January 13, 2026 titord "Parallel Context-of-Experts Decoding for ...

CompassMem: Event-Centric Logic Maps for Agent Memory

19 Jan 2026

Contributed by Lukas

The collaboration between Gaoling School of Artificial Intelligence, Renmin University of China published a paper in January 8, 2026 titled "Memory Ma...

Multidimensional Safety Evaluation of Frontier AI Models

19 Jan 2026

Contributed by Lukas

This January 17, 2026 research collaboration between Fudan University, Shanghai Innovation institute, Deakin University and UIUC provide a report whic...

H-net: End-to-End Hierarchical Sequence Modeling via Dynamic Chunking

19 Jan 2026

Contributed by Lukas

On this July 15, 2025 collaboration between Carnegie Mellon University and Cartesia AI researchers introduce H-net in the paper "Dynamic Chunking for ...

GLM-Image: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

19 Jan 2026

Contributed by Lukas

We review the January 1, 2026 paper "GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning" from...

MATTRL: Collaborative Test-Time Reinforcement Learning for Multi-Agent Reasoning

19 Jan 2026

Contributed by Lukas

The collaboration between MIT, NUS, NYU, Microsoft, UW , Columbia and NTU describes an inference retrieval Chain of Thought enhancement. The researche...

Process Reward Learning for LLM Reasoning Optimization

19 Jan 2026

Contributed by Lukas

Researchers from the University of Illinois Urbana-Champaign have introduced Process Reward Learning (PRL) on a January 15, 2026 paper. PRL is a novel...

A Chain of Thought reasoning academic brawl

19 Jan 2026

Contributed by Lukas

We review a late 2025 heavyweight academic brawl over the future of AI reasoning when folks use Reinforcement Learning with Verifiable Rewards (RLVR) ...

OpenRouter 2025 Report: Analysis of Global LLM Usage Patterns

19 Jan 2026

Contributed by Lukas

We review two research papers, one from January 15, 2026 by OpenRouter Inc and a16z (Andreessen Horowitz) and another from April 2025 by Andrey Fradki...

How to make LLMs better at sci-fi writing

19 Jan 2026

Contributed by Lukas

In a joint collaboration between Harvard University, Carnegie Mellon University, Stanford University the January 12, 2026 paper "LLM Review: Enhancing...

Attention with a bias

17 Jan 2026

Contributed by Lukas

We review why some transformer models use a bias in attention and how ALiBi helps with long context. The provided sources focus on significant advance...

Squisher: Approximating the Fisher Information Matrix and use cases

17 Jan 2026

Contributed by Lukas

We focus on the July 2025 paper, "Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator". The pa...

NVIDIA: TTT-E2E: Unlocking Long-Context Learning via End-to-End Test-Time Training

17 Jan 2026

Contributed by Lukas

This December 31, 2025 NVIDIA research introduces **TTT-E2E**, a novel approach to large language model memory that treats long-context processing as ...

Scaling laws: long context length and in context learning

17 Jan 2026

Contributed by Lukas

Recent advancements in Long Context Language Models (LCLMs) demonstrate that In-Context Learning (ICL) capabilities follow predictable power-law scali...

DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup

14 Jan 2026

Contributed by Lukas

On January 12, 2026 DeepSeek released its paper on **Engram**, a novel AI architecture that incorporates **conditional memory** to optimize how large ...

PageANN: Scalable Disk ANNS with Page-Aligned Graphs

07 Dec 2025

Contributed by Lukas

The research paper presents PageANN, a novel framework engineered to overcome the severe latency and scalability limitations facing existing **disk-ba...

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values

04 Dec 2025

Contributed by Lukas

This research presents a novel method for efficient long-context modeling in Large Language Models (LLMs) by tackling the quadratic complexity of atte...

NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

29 Nov 2025

Contributed by Lukas

The research systematically investigates the effects of integrating various gating mechanisms into the standard softmax attention layer, comparing ove...

NeurIPS 2025: Large Language Diffusion Models

29 Nov 2025

Contributed by Lukas

This research paper introduces LLaDA, an 8-billion parameter language model based on the masked diffusion model (MDM) architecture, specifically devel...

NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example

29 Nov 2025

Contributed by Lukas

This research examines the data efficiency of Reinforcement Learning with Verifiable Reward (RLVR) when applied to large language models for mathemati...

NeurIPS 2025: Parallel Scaling Law for Language Models

29 Nov 2025

Contributed by Lukas

The research proposes Parallel Scaling (PARSCALE) as a novel, efficient strategy to enhance Large Language Model (LLM) capacity by increasing parallel...

NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

29 Nov 2025

Contributed by Lukas

The academic paper introduces Self-play Reinforcement Learning (SeRL), a framework engineered to enhance the reasoning capabilities of Large Language ...

NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces

29 Nov 2025

Contributed by Lukas

The provided text outlines DYNAACT, a new framework intended to enhance sequential reasoning in Large Language Models (LLMs) by dynamically managing t...

NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

29 Nov 2025

Contributed by Lukas

The academic paper introduces KGGen, a novel text-to-knowledge-graph generator designed to overcome the scarcity and poor quality of automatically ext...

NeurIPS 2025: Self-Adapting Language Models

29 Nov 2025

Contributed by Lukas

The academic paper presents the Self-Adapting LLM (SEAL) framework, designed to allow large language models to overcome their static nature by transfo...

NeurIPS 2025: Thinkless: LLM Learns When to Think

29 Nov 2025

Contributed by Lukas

The research introduces Thinkless, a framework designed to solve the computational inefficiency of Large Language Models (LLMs) that overuse chain-of-...

NeurIPS 2025: FlashBias: Fast Computation of Attention with Bias

29 Nov 2025

Contributed by Lukas

The source introduces FlashBias, an innovative algorithm designed to significantly accelerate the efficiency of the Transformer attention mechanism wh...

NeurIPS 2025: A-Mem: Agentic Memory for LLM Agents

29 Nov 2025

Contributed by Lukas

The source details the creation and evaluation of Agentic Memory (A-MEM), a novel memory system for Large Language Model (LLM) agents that addresses t...

NeurIPS 2025: MoBA: Mixture of Block Attention for Long-Context LLMs

29 Nov 2025

Contributed by Lukas

This paper introduces Mixture of Block Attention (MoBA) to address the prohibitive quadratic computational overhead inherent in traditional attention ...

NeurIPS 2025: Reward Reasoning Model

29 Nov 2025

Contributed by Lukas

The source details the development and evaluation of Reward Reasoning Models (RRMs), which are designed to enhance Large Language Model (LLM) alignmen...

Anthropic: Disrupting the First AI-Orchestrated Cyber Espionage Campaign

27 Nov 2025

Contributed by Lukas

Anthropic released a detailed report outlining the detection and disruption of an advanced cyber espionage campaign identified in late 2025, which the...

Anthropic: reward hacking & misalignment & sabotage

22 Nov 2025

Contributed by Lukas

Anthropic’s research details how **realistic AI training processes can inadvertently create misaligned models** through a mechanism called "reward h...

DeepSeek-OCR: Contexts Optical Compression

22 Nov 2025

Contributed by Lukas

The October 21, 2025 Deepseek paper introduces **DeepSeek-OCR**, a Vision-Language Model (VLM) designed to investigate the feasibility of **contexts o...

Neuromorphic computing: Brain-Inspired AI and Hardware

22 Nov 2025

Contributed by Lukas

These sources provide a comprehensive overview of **neuromorphic computing (NC)**, focusing heavily on specialized hardware and advanced Spiking Neura...

Meta: SAM 3

20 Nov 2025

Contributed by Lukas

This Meta November 18 2025 paper details the development, training, and evaluation of **Segment Anything Model 3 (SAM 3)**, a promptable segmentation ...

Mamba-360: State Space Models for Long Sequence Modeling

19 Nov 2025

Contributed by Lukas

The April 24, 2024 paper provides a comprehensive **survey of State Space Models (SSMs)**, outlining their evolution, fundamental mathematical princip...

Mixture-of-Depths: Dynamic Compute Allocation in Transformers

19 Nov 2025

Contributed by Lukas

These April 4, 2024 Google Deepmind paper introduces the **Mixture-of-Depths (MoD)** transformer architecture, a method that improves efficiency by le...

MLP Mixer Models

19 Nov 2025

Contributed by Lukas

These sources collectively explore the **MLP-Mixer architecture** and its numerous extensions across computer vision and audio tasks. The core concept...

Marin: Open LLM Optimization & Diagnostics

19 Nov 2025

Contributed by Lukas

Marin is an open lab dedicated to the transparent research and development of foundation models (FMs), focusing its core mission on identifying **how ...

vAttention Vs Strata: advanced GPU memory management

19 Nov 2025

Contributed by Lukas

We compare and contrast two advanced 2025 memory management and scheduling techniques for optimizing Large Language Model (LLM) serving throughput and...

AMD: Instella: Fully Open Language Models with Stellar Performance

16 Nov 2025

Contributed by Lukas

The November 13, 2025 paper by AMD introducs **Instella**, a new family of **fully open-source** three-billion-parameter large language models (LLMs) ...

Mechanistic interpretability: Decoding the AI's Inner Logic: Circuits and Sparse Features

15 Nov 2025

Contributed by Lukas

Ten different sources are used in this episode which are excerpts from academic papers and technical reports focusing on mechanistic interpretability ...

Spectral Gap: Analysis of Attention Layers and Graph Transformers

10 Nov 2025

Contributed by Lukas

We review two papers on Spectral Gap, one 2021 and another from 2025. The first source presents the **Spectral Attention Network (SAN)**, a novel Tran...

CARTRIDGE: Efficient In-Context Learning via Distillation

10 Nov 2025

Contributed by Lukas

The June 13, 2025 joint collaboration between Stanford University, Caltech and University at Buffalo introduces a novel method called **CARTRIDGE** fo...

Metacognition and Skill Discovery in LLM Math Reasoning

10 Nov 2025

Contributed by Lukas

The May 20, 2024 academic paper explores the **metacognitive capabilities of Large Language Models (LLMs)**, specifically focusing on mathematical pro...

Context Distillation for Language Models

10 Nov 2025

Contributed by Lukas

These five papers from 2022 up to 2025 discuss various **knowledge distillation techniques** aimed at transferring the capabilities of large language ...

Tempo: SLO-Aware LLM Serving Maximizing Service Gain

10 Nov 2025

Contributed by Lukas

The April 24, 2025 academic paper introduces **Tempo**, a novel scheduling system designed to optimize Large Language Model (LLM) serving by addressin...

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow

10 Nov 2025

Contributed by Lukas

The January 30, 2025 paper introduces **LLM-AutoDiff**, a novel framework for **Automatic Prompt Engineering (APE)** that allows for the optimization ...

Confucius: Intent-Driven Network Management with Multi-Agent LLMs

10 Nov 2025

Contributed by Lukas

The August 27, 2025 paper introduces **Confucius**, a novel multi-agent Large Language Model (LLM) framework developed by Meta for **intent-driven net...

SYMPHONY: Memory Management for LLM Multi-Turn Inference

10 Nov 2025

Contributed by Lukas

The 2024 paper introduces **SYMPHONY**, a novel system designed to improve memory management and scheduling for **Large Language Model (LLM) inference...

DSPy and TextGrad: Compiling Language Model Systems

10 Nov 2025

Contributed by Lukas

These two academic papers introduce novel programming models aimed at systematically optimizing complex AI systems, particularly those built using Lar...

Vidur: Simulation for Efficient LLM Inference Deployment

10 Nov 2025

Contributed by Lukas

The May 21, 2024 paper introduces **Vidur**, a new, high-fidelity simulation framework designed to optimize the deployment and performance of Large La...

Continuous Autoregressive Language Models: CALM

10 Nov 2025

Contributed by Lukas

The October 31, 2025 paper introduces **Continuous Autoregressive Language Models (CALM)**, a new paradigm designed to overcome the efficiency bottlen...

A Framework for LLM Application Safety Evaluation

10 Nov 2025

Contributed by Lukas

The July 13, 2025 paper " Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications" introduces a practical **fra...

Doubly Stochastic Attention for Transformers

10 Nov 2025

Contributed by Lukas

The four papers we review dated from 1967 up to two papers in 2025 collectively discuss the mathematical properties and deep learning applications of ...

Random Walk Methods for Graph Learning and Networks

10 Nov 2025

Contributed by Lukas

We provide a review of the evolution of value of Page Rank to Random Walk with Random Restart and it's application to neural networks focusing on five...

AlphaEvolve: Mathematical Discovery at Scale

10 Nov 2025

Contributed by Lukas

The November 3, 2025 paper provide an overview of the **AlphaEvolve** system, an AI-powered evolutionary approach for mathematical exploration and dis...

AdaFlow: Variance-Adaptive Flow-Based Imitation Learning

10 Nov 2025

Contributed by Lukas

The November 22, 2024 paper from UT Texas introduces **AdaFlow**, a novel imitation learning framework designed to improve both the efficiency and div...

zFLoRA: Zero-Latency Fused Low-Rank Adapters

04 Nov 2025

Contributed by Lukas

The October 28, 2025 Samsung research paper introduces **zFLoRA (zero-latency fused low-rank adapter)**, a novel parameter-efficient fine-tuning (PEFT...

SuperBPE: Space Travel for Language Models

04 Nov 2025

Contributed by Lukas

The August 26, 2025 collaboration between the University of Washington, NVIDIA and the Allen Institute for AI paper introduces **"SuperBPE: Space Trav...

Google: Supervised Reinforcement Learning for Step-wise Reasoning in LLMs

04 Nov 2025

Contributed by Lukas

The October 29 2025 Google research paper introduces **Supervised Reinforcement Learning (SRL)**, a novel framework designed to improve the complex, m...

MorphKV: Constant-Sized KV Caches for LLM Inference

04 Nov 2025

Contributed by Lukas

The June 7, 2025 UT Austin and University of British Colombia collaboration academic paper introduces **MorphKV**, a novel inference-time technique de...

HALoS: Hierarchical Asynchronous LLM Training over Slow Networks

04 Nov 2025

Contributed by Lukas

The June 5, 2025 research paper introducing **HALoS: Hierarchical Asynchronous Local SGD**, a novel optimization framework designed for training large...

Anchored Diffusion Language Model: Superior Generation and Reasoning

04 Nov 2025

Contributed by Lukas

The May 24, 2025 UT Austin paper introduces the **Anchored Diffusion Language Model (ADLM)**, a novel approach that aims to improve discrete language ...

Gumbel-Softmax for Differentiable Categorical Reparameterization and Selective Networks

04 Nov 2025

Contributed by Lukas

These two papers (years 2017, 2022) introduce and then apply the **Gumbel-Softmax distribution** as a differentiable gradient estimator for **categori...

PolicySmith: Automated Systems Heuristic Generation via LLMs

04 Nov 2025

Contributed by Lukas

The October 9, 2025 paper from UT Austin paper introduces **PolicySmith**, a novel framework that automates the design of system policies, arguing tha...

RetNet: Retentive Networks: Transformer Successor for Large Language Models

02 Nov 2025

Contributed by Lukas

The August 9, 2023 paper introduces the **Retentive Network (RetNet)**, a proposed foundational architecture for large language models intended to suc...

Kimi Linear: Efficient Expressive Attention Architecture

02 Nov 2025

Contributed by Lukas

The October 30, 2025 **technical report** details the development and evaluation of **Kimi Linear**, a novel **hybrid linear attention architecture** ...

ALiBi: Attention with Linear Biases Enables Length Extrapolation

01 Nov 2025

Contributed by Lukas

The April 22, 2022 collaboration between University of Washington, Facebook AI and the Allen Institute for AI introduces Attention with Linear Biases ...

Quest: Query-Aware Sparsity for Efficient LLM Inference

31 Oct 2025

Contributed by Lukas

The August 26, 2024 academic paper introduces **Quest**, a novel algorithm designed to improve the inference efficiency of **Long-Context Large Langua...

Flash-LLM: Efficient LLM Inference with Unstructured Sparsity on Tensor Cores

31 Oct 2025

Contributed by Lukas

The September 19, 2025 Alibaba paper introduces **Flash-LLM**, a novel software framework designed to enable **cost-effective and highly-efficient inf...

ELASTIC: Linear Attention for Sequential Interest Compression

31 Oct 2025

Contributed by Lukas

The February 12, 2025 KuaiShou Inc paper introduces **ELASTIC**, an Efficient Linear Attention for SequenTial Interest Compression framework designed ...

Anthropic: Introspective Awareness in LLMs

31 Oct 2025

Contributed by Lukas

On October 29, 2025 Anthropic presented research investigating the existence of **functional introspective awareness** in large language models (LLMs)...

Small Versus Large Models for Requirements Classification

31 Oct 2025

Contributed by Lukas

The October 24, 2025 collaboration between many universities have published a paper thst compares the performance of **Large Language Models (LLMs)** ...

Hyper-Scaling LLM Inference with KV Cache Compression

31 Oct 2025

Contributed by Lukas

The June 5, 2025 collaboration between University of Edinburgh and Nvidia paper introduces the concept of **inference-time hyper-scaling** for large l...

Architectural Scaling Laws for Efficient LLMs

31 Oct 2025

Contributed by Lukas

The October 21, 2025 collaboration paper between UW-Madison and Amazon Web Services discuss the critical role of the **Multi-Layer Perceptron (MLP) in...

ATTENTION2D and lean attention: Distributed Self-Attention

29 Oct 2025

Contributed by Lukas

We cover two new innovations from Microsoft extending ideas from the original old **FlashAttention**. Flash Attention is an IO-aware attention algorit...

Sentence-BERT: Siamese Networks for Sentence Embeddings

29 Oct 2025

Contributed by Lukas

The provided text introduces **Sentence-BERT (SBERT)**, a modification of the popular **BERT** and **RoBERTa** language models, designed to efficientl...

TxGNN: Foundation Model for Zero-Shot Drug Repurposing

29 Oct 2025

Contributed by Lukas

The source provides excerpts from a scientific paper introducing **TxGNN**, a novel graph foundation model designed for **zero-shot drug repurposing**...

STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency

26 Oct 2025

Contributed by Lukas

These April 29, 2024 paper provides an overview of the challenges associated with using **NVIDIA's Multi-Instance GPU (MIG)** technology, specifically...

Strata: Efficient Hierarchical Context Caching for LLM Serving

26 Oct 2025

Contributed by Lukas

The August 26, 2025 collaboration between Stanford, NVIDIA, Shanghai Jiao Tong University, University of Michigan, University of Colorado Boulder, Car...

FlashAttention: IO-Aware Fast and Memory-Efficient Attention

26 Oct 2025

Contributed by Lukas

This is a classic review of a now old but yet still important paper, the original Flash Attention paper. We review this in light of advances in compil...

Introducing MTEB v2: Multimodal Embedding Evaluation

26 Oct 2025

Contributed by Lukas

On October 20, 2025 Hugging Face released **MTEB v2**, a significant refactoring of the Massive Text Embedding Benchmark, which was originally designe...

Structural Understanding of LLM Overthinking

26 Oct 2025

Contributed by Lukas

The October 10, 2025 paper from the University of Michigan and **Google DeepMind** concerning the phenomenon of **"overthinking" in Large Language Mod...

Stuck in the Matrix: LLM Spatial Reasoning

26 Oct 2025

Contributed by Lukas

The October 23 2025 research paper **probes the spatial reasoning capabilities of Large Language Models (LLMs) when processing text-based inputs**, sp...

LLM-Empowered Knowledge Graph Construction: A Survey

26 Oct 2025

Contributed by Lukas

This October 23, 2025 Xidian University academic survey systematically reviews the transformative impact of **Large Language Models (LLMs)** on the th...

Survey of Emerging Topics in AI and Robotics

26 Oct 2025

Contributed by Lukas

The October 23, 2025 collaboration between UC San Diego , NVIDIA , META , UW-Madison , and UNC introduces **Real Deep Research (RDR)**, a systematic f...

The Free Transformer: VAE Extension for Decoders

26 Oct 2025

Contributed by Lukas

The October 20, 2025 Meta FAIR paper introduces the **Free Transformer**, an innovative extension of the decoder-only Transformer architecture, which ...

LithOS: Operating System for Efficient GPU Machine Learning

26 Oct 2025

Contributed by Lukas

This 2025 CMU paper introduces **LithOS**, a novel operating system designed to improve the efficiency and utilization of Graphics Processing Units (G...

Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning

26 Oct 2025

Contributed by Lukas

This October 23, 2025 technical report from the Ling Team introduces the **Ring-linear model series**, specifically Ring-mini-linear-2.0 and Ring-flas...

GigaBrain-0: World Model-Powered Generalist Robots

26 Oct 2025

Contributed by Lukas

The October 22, 2025 GigaAI paper introduces **GigaBrain-0**, a novel Vision-Language-Action (VLA) model designed for general-purpose robotic systems,...

Open-o3 Video: Spatio-Temporal Grounded Reasoning

26 Oct 2025

Contributed by Lukas

The October 25, 2025 Bytedance paper introduces **Open-o3 Video**, a novel framework developed by researchers from **Peking University** and **ByteDan...

Cattell–Horn–Carroll Theory of Intelligence

26 Oct 2025

Contributed by Lukas

We review the Cattell-Horn-Carroll (CHC) used in recent AI papers on the definition of what AGI could be. The provided sources offer a comprehensive o...

Internal Mechanisms of a Large Language Model

26 Oct 2025

Contributed by Lukas

This March 27, 2025 Anthropic paper provides an overview and detailed excerpts from two related Anthropic papers concerning the **interpretability of ...

Latent Constituency in Humans and LLMs

26 Oct 2025

Contributed by Lukas

The provided text is an academic paper titled **"Active Use of Latent Constituency Representation in both Humans and Large Language Models,"** which e...

Cognitive Impact of AI and Search on Essay Writing

26 Oct 2025

Contributed by Lukas

The June 2025 paper presents excerpts from a study examining the **cognitive and performance differences** in essay writing among participants using a...

LFM2-8B-A1B: Efficient On-Device Mixture-of-Experts

26 Oct 2025

Contributed by Lukas

The October 7, 2025 technical release by Liquid AI introducing their new model, **LFM2-8B-A1B**, an **on-device Mixture-of-Experts (MoE)** designed fo...

«« ← Prev Page 2 of 5 Next → »»