AI: post transformers
Episodes
HBF: High Bandwidth Flash for AI Inferencing
15 Oct 2025
Contributed by Lukas
These sources and patent discuss **SanDisk's development of High Bandwidth Flash (HBF)**, a technology designed to address the significant memory and ...
Architectural Migration to Multi-head Latent Attention
15 Oct 2025
Contributed by Lukas
The sources detail a novel method called **MHA2MLA** (Multi-Head Attention to Multi-Head Latent Attention), which efficiently adapts pre-trained large...
COPA: Composable On-Package GPU Architecture for Domain Specialization
15 Oct 2025
Contributed by Lukas
This April 2021 academic paper from **NVIDIA** discusses the challenge of designing **converged GPUs** that efficiently handle the diverging architect...
Performance of Confidential Computing for Large Language Models
11 Oct 2025
Contributed by Lukas
These sources collectively discuss advancements in **scalable, efficient, and secure machine learning (ML) data systems**, often within the context of...
Google: Confidential Computing with Accelerated AI Workloads on GCE
11 Oct 2025
Contributed by Lukas
The provided sources are a collection of Google Cloud documentation and blog excerpts detailing the features and implementation of **Confidential Comp...
AWS: Nitro System: Security, Enclaves, and Generative AI
11 Oct 2025
Contributed by Lukas
These sources provide an extensive overview of **AWS Nitro Enclaves**, an isolated compute environment designed to protect highly sensitive data withi...
Anthropic: Confidential Inference via Trusted Virtual Machines
11 Oct 2025
Contributed by Lukas
These sources, an announcement from Anthropic and a technical whitepaper co-authored with Pattern Labs, provide an **overview of Confidential Inferenc...
RAND: Securing AI Model Weights: Preventing Theft and Misuse
11 Oct 2025
Contributed by Lukas
The provided texts are excerpts from a **RAND Corporation research report** titled "Securing AI Model Weights: Preventing Theft and Misuse of Frontier...
Training-Free GRPO: Policy Optimization via Context Space
11 Oct 2025
Contributed by Lukas
The October 9, 2025 paper from **Tencent Youtu Lab** introduces **Training-Free Group Relative Policy Optimization (Training-Free GRPO)**, a novel met...
Multi-Agent Tool-Integrated Policy Optimization (MATPO)
11 Oct 2025
Contributed by Lukas
The October 6, 2025 paper introduces **Multi-Agent Tool-Integrated Policy Optimization (MATPO)**, a novel reinforcement learning framework designed to...
UniVideo: Unified Video Understanding, Generation, and Editing
11 Oct 2025
Contributed by Lukas
The October 9, 2025 paper details the architecture, training, and evaluation of **UniVideo**, a unified multimodal generative system capable of **hand...
Dragon Hatchling: Brain-Inspired AI Architecture
10 Oct 2025
Contributed by Lukas
This September 30, 2025 paper detail research into **Brain Dynamics Hypothesis (BDH)** models, particularly the **BDH-GPU** architecture, which propos...
AGENTFLOW: In-the-Flow Agentic System Optimization
10 Oct 2025
Contributed by Lukas
The October 7, 2025 joint collaboration between Stanford University, Texas A&M University, UC San Diego, & Lambda paper introduces **AGENTFLOW**, a no...
Less is More: Recursive Reasoning with Tiny Networks
10 Oct 2025
Contributed by Lukas
This October 6, 2025 paper from Alexia Jolicoeur-Martineau at Samsung SAIL Montréal, provides an overview and detailed comparison of two recurrent re...
Early Experience for Language Agent Improvement
10 Oct 2025
Contributed by Lukas
This October 10, 2025 joint collaboration between Meta Superintelligence Labs, FAIR at Meta, and The Ohio State University academic paper proposes and...
Petri: Accelerating AI Safety Auditing
10 Oct 2025
Contributed by Lukas
On October 6, 2925 Anthropic introduces **Petri (Parallel Exploration Tool for Risky Interactions)**, an open-source framework developed for automated...
Agentic Context Engineering: Evolving Contexts for Self-Improving LLMs
10 Oct 2025
Contributed by Lukas
The October 6, 2025 paper introduces **Agentic Context Engineering (ACE)**, a novel framework designed to enhance the performance of Large Language Mo...
CLUE: Hidden-State Clustering for Non-parametric Verification
10 Oct 2025
Contributed by Lukas
The October 2, 2025 technical report from **Tencent AI Lab** introduces **CLUE (Clustering and Experience-based Verification)**, a novel, non-parametr...
Low-Precision Transformer Failure in Flash Attention
10 Oct 2025
Contributed by Lukas
This October 5 2025 paper presents the first mechanistic explanation for a persistent **training instability** experienced when using **low-precision ...
Paris: Decentralized Open-Weight Diffusion Model
08 Oct 2025
Contributed by Lukas
The October 2025 paper introduces **Paris**, a novel open-weight diffusion model for text-to-image generation that was trained using a completely **de...
DC-VideoGen: Efficient Video Generation with Deep Compression
08 Oct 2025
Contributed by Lukas
The September 29 2025 paper introduces **DC-VideoGen**, a new post-training framework designed to significantly accelerate video diffusion models and ...
GNN101: Visual Learning of Graph Neural Networks
08 Oct 2025
Contributed by Lukas
The November 2024 paper introduces **GNN101**, an open-source, web-based interactive visualization tool designed to help non-experts learn about **Gra...
Reactive Transformer: Stateful Real-Time Language Models
08 Oct 2025
Contributed by Lukas
The October 2025 paper introduces the **Reactive Transformer (RxT)**, a novel neural network architecture designed by Adam Filipek and Reactive AI to ...
Imperceptible Jailbreaking Against Large Language Models
08 Oct 2025
Contributed by Lukas
The October 2025 academic paper introduces a novel **imperceptible jailbreaking attack** against Large Language Models (LLMs) that exploits Unicode **...
ACON: Optimizing Context Compression for LLM Agents
08 Oct 2025
Contributed by Lukas
The October 2025 papar provide an overview of **Agent Context Optimization (ACON)**, a novel framework designed to enhance the efficiency and performa...
CoDA: Collaborative Multi-Agent Data Visualization
08 Oct 2025
Contributed by Lukas
The October 2025 paper introduces **CoDA (Collaborative Data-visualization Agents)**, a novel multi-agent system designed to automate complex data vis...
RECAP: Safety Alignment via Counter-Aligned Prefilling
08 Oct 2025
Contributed by Lukas
The October 2025 academic paper introduces **RECAP (Robust Safety Alignment via Counter-Aligned Prefilling)**, a novel reinforcement learning (RL) met...
ONNX Ecosystem, Optimization, and Deployment
08 Oct 2025
Contributed by Lukas
The provided sources center on the **Open Neural Network Exchange (ONNX)** format and its inference engine, **ONNX Runtime**, highlighting their role ...
Emergent Abilities of Large Language Models
08 Oct 2025
Contributed by Lukas
The sources (October 2022, March 2025) provide an extensive examination of **emergent abilities** in large language models (LLMs), defining them as un...
Implicit Dynamics of In-Context Learning
08 Oct 2025
Contributed by Lukas
This July 2025 research paper explores **In-Context Learning (ICL)** in Large Language Models (LLMs), which is the striking ability of these models to...
Contextual Blocks: Implicit Weight Updates and Federated Learning
08 Oct 2025
Contributed by Lukas
We compare and contrast the math behind two recent research papers which we have covered individually before on this podcast:July 2025:Learning withou...
MotionRAG: Retrieval-Augmented Image-to-Video Generation
08 Oct 2025
Contributed by Lukas
The September 2025 paper introduces **MotionRAG**, a novel retrieval-augmented framework designed to enhance motion realism in image-to-video generati...
NIST Evaluation of DeepSeek AI Models
08 Oct 2025
Contributed by Lukas
The provided text is an excerpt from a **technical evaluation report** conducted by the Center for AI Standards and Innovation (CAISI), housed within ...
Test-Time Reinforcement Learning for LLMs
08 Oct 2025
Contributed by Lukas
This June 2025 paper introduces a novel methodology called **Test-Time Reinforcement Learning (TTRL)**, which enables Large Language Models (LLMs) to ...
LongCodeZip: Compress Long Code Context for LLMs
08 Oct 2025
Contributed by Lukas
The October 2025 paper introduces **LongCodeZip**, a novel, training-free, and model-agnostic framework designed for **compressing long code contexts*...
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
08 Oct 2025
Contributed by Lukas
The September 2025 paper introduces **ReasoningBank**, a novel memory framework designed to enhance Large Language Model (LLM) agents by distilling an...
Analog In-Memory Attention for Energy-Efficient LLMs
08 Oct 2025
Contributed by Lukas
Thus November 2024 paper and new analysis in September 2025 provide a comprehensive overview of a novel **Analog In-Memory Computing (AIMC)** architec...
Regression Language Models for Code Metrics
03 Oct 2025
Contributed by Lukas
This September 30 2025 academic paper, introduces Regression Language Models (RLMs) as a unified method for code-to-metric regression, which is the ta...
Introducing RTEB: Retrieval Embedding Benchmark
03 Oct 2025
Contributed by Lukas
The text introduces the **Retrieval Embedding Benchmark (RTEB)**, a new standard designed to accurately evaluate the **retrieval accuracy of embedding...
CUDA Unified Memory and Heterogeneous Memory Management
02 Oct 2025
Contributed by Lukas
The provided sources offer a comprehensive look at memory management for GPU-accelerated computing, focusing heavily on **Heterogeneous Memory Managem...
Moravec's Paradox and AI Automation Limits
01 Oct 2025
Contributed by Lukas
These two 2025 research papers collaboratively examine **Moravec's Paradox**, which posits that skills effortless for humans (like perception and mobi...
Characterizing LLM KV Cache Workloads in Production
01 Oct 2025
Contributed by Lukas
The June 2025 paper characterizes and optimizes the **Key-Value Cache (KV$)** workload patterns associated with serving large language models (LLMs) a...
BurstGPT: A Real-World LLM Serving Workload Dataset
01 Oct 2025
Contributed by Lukas
The May 2025 academic paper introduces **BurstGPT**, a novel, real-world workload dataset consisting of over ten million traces from regional Azure Op...
Qwen3-Next & Qwen3-Omni technical report
30 Sep 2025
Contributed by Lukas
These May and September 2025 technical reports introduce and evaluate two distinct but related large language models: the **Qwen3 family** and the **Q...
Variational Reasoning Framework for Language Models
29 Sep 2025
Contributed by Lukas
This September 26 2025 paper is an excerpt from a research paper introducing a variational reasoning framework designed to enhance the reasoning cap...
Federated Learning with Soft Embeddings for Retrieval
27 Sep 2025
Contributed by Lukas
This September 20 2025 paper introduce a novel, efficient architecture for training **retrieval models** used in retrieval-augmented generation (RAG) ...
Schoenfeld Theory Applied to Large Reasoning Models
27 Sep 2025
Contributed by Lukas
This September 18 2025 paper introduces a research project that applies **Schoenfeld’s Episode Theory**, a classic cognitive framework for analyzing...
CWM: Code Generation with World Models
27 Sep 2025
Contributed by Lukas
This Meta September 24 2025 paper provides an extensive overview of **Code World Model (CWM)**, a 32-billion-parameter dense decoder-only Transformer ...
EmbeddingGemma: Powerful Lightweight Text Representations
26 Sep 2025
Contributed by Lukas
The September 24 2025 paper introduces **EmbeddingGemma**, a novel, lightweight text embedding model developed by **Google DeepMind**, built upon the ...
CE-GPPO: Controlling Entropy via Gradient-Preserving Policy Optimization
26 Sep 2025
Contributed by Lukas
The September 25 2035 paper introduces a novel reinforcement learning (RL) algorithm, **Controlling Entropy via Gradient-Preserving Policy Optimizatio...
Seedream 4.0: Multimodal Image Generation System
26 Sep 2025
Contributed by Lukas
The September 24 2025 paper is a technical report from **ByteDance Seed** detailing the **Seedream 4.0** system, an advanced multimodal image generati...
Tree-based Group Policy Optimization for LLM Agents
26 Sep 2025
Contributed by Lukas
The September 25 2025 paper introduces **Tree-based Group Relative Policy Optimization (Tree-GRPO)**, a new reinforcement learning (RL) method designe...
GDPval: Measuring AI Performance on Real-World Work
26 Sep 2025
Contributed by Lukas
The September 25 2025 dated sources introduce **GDPval**, a novel benchmark created by OpenAI to evaluate the performance of **AI models** on **econom...
Adaptive Compression Techniques for Efficient LLM Inference
20 Sep 2025
Contributed by Lukas
These 14 research papers provide an overview of various **compression techniques for Large Language Models (LLMs)**, primarily focusing on **reducing ...
LLM-I: Interleaved Multimodal Creators via Tool-Use
20 Sep 2025
Contributed by Lukas
The September 2025 academic paper introduces **LLM-Interleaved (LLM-I)**, a novel, flexible framework for interleaved image-text generation that refra...
Evolving Language Models Without Labels: EVOL-RL
19 Sep 2025
Contributed by Lukas
This September 2025 paper source is a research paper from Tencent AI Lab and academic collaborators that introduces EVOL-RL, an Evolution-Oriented ...
SearchInstruct: Instruction Tuning with Dynamic Retrieval
19 Sep 2025
Contributed by Lukas
This September 2025 paper introduces SearchInstruct, a novel framework designed to enhance Supervised Fine-Tuning (SFT) of large language models (LLMs...
THOR: Hierarchical RL for Mathematical Reasoning
19 Sep 2025
Contributed by Lukas
This September 2025 paper describes THOR (Tool-Integrated Hierarchical Optimization via RL), a novel approach designed to enhance the mathematical re...
The Uneven Diffusion of AI Adoption
19 Sep 2025
Contributed by Lukas
The "Anthropic Economic Index report" documents the rapid and uneven adoption of Artificial Intelligence (AI), specifically using data from the compan...
FlowRL: Distribution Matching for LLM Reasoning
19 Sep 2025
Contributed by Lukas
This September 2025 paper introduces FlowRL, a novel reinforcement learning (RL) algorithm for large language models (LLMs) that shifts the optimizat...
Single-stream Policy Optimization for LLMs
19 Sep 2025
Contributed by Lukas
This September 2025 paper introduces Single-stream Policy Optimization (SPO), a new reinforcement learning algorithm for training Large Language Mode...
Pre-computing & reusing KV caches to accelerate RAG inference
18 Sep 2025
Contributed by Lukas
How can pre-computing and reusing Key-Value (KV) caches accelerate inference for Retrieval-Augmented Generation and other long-context LLM tasks?The p...
REFRAG: Rethinking RAG-based Decoding
18 Sep 2025
Contributed by Lukas
This September 2025 academic paper, titled "REFRAG: Rethinking RAG based Decoding," appears on the alphaXiv pre-print server. It focuses on Reframing ...
DeepSeek-R1: Reinforcing LLM Reasoning Through Self-Evolution
18 Sep 2025
Contributed by Lukas
This paper published on Nature on September 17 2025, "DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning," details the develop...
ShadowKV: High-Throughput Long-Context LLM Inference
17 Sep 2025
Contributed by Lukas
This April 2025 paper introduces ShadowKV, an innovative inference system for long-context Large Language Models (LLMs) designed to significantly e...
TailorKV: Hybrid KV Cache Compression for LLMs
17 Sep 2025
Contributed by Lukas
This May 2025 paper introduces TailorKV, a novel hybrid framework designed to optimize Key-Value (KV) cache management in large language models (LLMs)...
MIRAGE: Optimizing LLM KV Cache with Parameter Remapping
17 Sep 2025
Contributed by Lukas
This July 2025 paper discusses advanced memory optimization techniques for Large Language Models (LLMs), particularly focusing on KV cache managemen...
WebSailor-V2: Bridging Proprietary Agents with Synthetic Data and RL
17 Sep 2025
Contributed by Lukas
This September 2025 paper introduces WebSailor-V2, an open-source deep research agent developed by Alibaba Group's Tongyi Lab. The paper details a ...
Dynamic Chunking for Hierarchical Sequence Modeling
17 Sep 2025
Contributed by Lukas
This July 2025 paper introduces Hierarchical Networks (H-Nets), a novel architecture designed to move beyond traditional tokenization in large langua...
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning
17 Sep 2025
Contributed by Lukas
This September 2025 paper introduces LoFT, a novel framework designed to improve Long-Tailed Semi-Supervised Learning (LTSSL) by leveraging paramet...
QuantAgent: Multi-Agent LLM for High-Frequency Trading
17 Sep 2025
Contributed by Lukas
This September 2025 paper describes QuantAgent, a novel multi-agent large language model (LLM) framework designed for high-frequency quantitative tra...
Infini-gram: Scaling Unbounded N-gram Language Models
17 Sep 2025
Contributed by Lukas
This April 2025 paper introduces Infini-gram, a novel engine designed to scale n-gram language models to an unprecedented 5 trillion tokens and sup...
Generalist Reward Modeling with Inference-Time Scaling
16 Sep 2025
Contributed by Lukas
This April 2025 paper introduces Self-Principled Critique Tuning (SPCT), a novel method designed to enhance the inference-time scalability of Gene...
Hierarchical Reasoning Model: Brain-Inspired AI for Complex Tasks
16 Sep 2025
Contributed by Lukas
This August 2025 paper introduces the Hierarchical Reasoning Model (HRM), a novel AI architecture inspired by the human brain's hierarchical and mult...
Native Sparse Attention: Efficient Long-Context LLMs
16 Sep 2025
Contributed by Lukas
This February 2025 paper introduces Native Sparse Attention (NSA), a novel approach to address the computational demands of long-context modeling in ...
CodeI/O: Reasoning Patterns Through Code Input-Output Prediction
16 Sep 2025
Contributed by Lukas
This February 2025 paper introduce CodeI/O, a novel training method for Large Language Models (LLMs) that enhances general reasoning abilities by t...
Janus-Pro: Unified Multimodal AI with Scaled Improvements
16 Sep 2025
Contributed by Lukas
This January 2025 paper introduces Janus-Pro, an enhanced artificial intelligence model for multimodal understanding and generation. It builds upon ...
Federated Post-Training LLMs: An Accessibility and Efficiency Survey
16 Sep 2025
Contributed by Lukas
This August 2025 paper examines the evolving landscape of Federated Large Language Models (FedLLM), focusing on how large language models are post-t...
Non-Penetrative Tensor Partitioning for Collaborative AIoT Inference
16 Sep 2025
Contributed by Lukas
This June 2025 paper introduces Non-Penetrative Tensor Partitioning (NPTP), a novel method designed to improve the speed of collaborative inference fo...
Collaborative Edge Inference with Dynamic Task Offloading and Early Exiting
16 Sep 2025
Contributed by Lukas
This December 2024 paper introduces a collaborative inference framework designed for large-scale models in 5G smart city edge computing environmen...
Adaptive LLM Partitioning for Edge Inference
16 Sep 2025
Contributed by Lukas
This May 2025 paper introduces a resource-aware algorithm designed to optimize the performance of Large Language Models (LLMs) for low-latency inferen...
UQ: Unsolved Questions for Language Models
16 Sep 2025
Contributed by Lukas
This August 2025 paper introduces UQ, a novel evaluation framework designed to challenge large language models (LLMs) with complex, unsolved questions...
PETALS: Collaborative Large Language Model Inference and Fine-tuning
16 Sep 2025
Contributed by Lukas
This March 2023 paper introduces PETALS, a novel system designed to facilitate the collaborative inference and fine-tuning of large language models ...
AWQ: On-Device LLM Compression and Acceleration
15 Sep 2025
Contributed by Lukas
This July 2024 paper introduces Activation-aware Weight Quantization (AWQ), a novel method for compressing Large Language Models (LLMs) by quantizing ...
HybridServe: Efficient LLM Inference with Hybrid Caching
15 Sep 2025
Contributed by Lukas
This January 2025 paper introduces HybridServe, an LLM inference system designed to enhance throughput and cost-effectiveness for large language m...
FlexGen: High-Throughput LLM Inference on a Single GPU
15 Sep 2025
Contributed by Lukas
This June 2023 paper introduces FlexGen, a novel high-throughput generation engine designed to overcome the substantial computational and memory deman...
GraphSAGE: Inductive Representation Learning on Large Graphs
15 Sep 2025
Contributed by Lukas
This September 2018 paper introduces GraphSAGE, a novel inductive framework designed to generate node embeddings for large, evolving graphs, addres...
MetaGraph: knowledge graphs from financial NLP
15 Sep 2025
Contributed by Lukas
This September 2025 paper presents MetaGraph, a novel methodology for constructing knowledge graphs from scientific literature, specifically applie...
Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Model
15 Sep 2025
Contributed by Lukas
This August 2025 paper explores the critical area of fact-checking and factuality evaluation in Large Language Models (LLMs). It systematically analy...
The Illusion of Diminishing Returns in LLM Execution
15 Sep 2025
Contributed by Lukas
This September 2025 paper explores the concept of long-horizon execution in Large Language Models (LLMs), arguing that marginal gains in single-step...
PyTorch FSDP: Scaling Fully Sharded Data Parallel
15 Sep 2025
Contributed by Lukas
This September 2023 paper introduces PyTorch Fully Sharded Data Parallel (FSDP), an advanced solution designed to scale the training of exceptionall...
Llama 3: Architecture, Capabilities, and Safety
14 Sep 2025
Contributed by Lukas
On this November 2025 paper the Meta Llama Team's paper introduces Llama 3, a new family of large language models featuring 8B, 70B, and 405B paramete...
Graph Patterns of Knowledge in Large Language Models
14 Sep 2025
Contributed by Lukas
This May 2025 paper explores the structural patterns of knowledge within Large Language Models (LLMs) by adopting a graph-based perspective. The autho...
All for One: LLMs Solve Mental Math at the Last Token
13 Sep 2025
Contributed by Lukas
This September 2025 published research investigates how large language models (LLMs) perform mental math, particularly focusing on the flow of inform...
Survey of Reinforcement Learning for Large Reasoning Models
13 Sep 2025
Contributed by Lukas
This September 2025 paper provides a comprehensive overview of Reinforcement Learning (RL) as applied to Large Reasoning Models (LRMs). It breaks d...
SpikingBrain: Brain-Inspired LLMs for Efficient Long-Context Processing
13 Sep 2025
Contributed by Lukas
These September 2025 papers present a technical report on SpikingBrain, a novel family of large language models (LLMs) that draw inspiration from brai...
Statistical Methods for Generative AI Reliability
13 Sep 2025
Contributed by Lukas
This September 2025 paper explores the critical role of statistical methods in enhancing the reliability and functionality of Generative AI (GenAI), w...
EntiGraph: Scaling Language Models with Synthetic Pretraining
13 Sep 2025
Contributed by Lukas
This October 2024 paper introduces synthetic continued pretraining (synthetic CPT), a novel method designed to enhance language model knowledge acqu...
NOVELTYBENCH: Evaluating Language Model Diversity
12 Sep 2025
Contributed by Lukas
This August 2025 paper introduces NOVELTYBENCH, a new benchmark designed to evaluate how well large language models (LLMs) generate diverse and high...
HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization
12 Sep 2025
Contributed by Lukas
This April 2025 paper introduces HyperController, a novel and computationally efficient algorithm designed to optimize hyperparameters during the tra...