AI: post transformers

HBF: High Bandwidth Flash for AI Inferencing

15 Oct 2025

Contributed by Lukas

These sources and patent discuss **SanDisk's development of High Bandwidth Flash (HBF)**, a technology designed to address the significant memory and ...

Architectural Migration to Multi-head Latent Attention

15 Oct 2025

Contributed by Lukas

The sources detail a novel method called **MHA2MLA** (Multi-Head Attention to Multi-Head Latent Attention), which efficiently adapts pre-trained large...

COPA: Composable On-Package GPU Architecture for Domain Specialization

15 Oct 2025

Contributed by Lukas

This April 2021 academic paper from **NVIDIA** discusses the challenge of designing **converged GPUs** that efficiently handle the diverging architect...

Performance of Confidential Computing for Large Language Models

11 Oct 2025

Contributed by Lukas

These sources collectively discuss advancements in **scalable, efficient, and secure machine learning (ML) data systems**, often within the context of...

Google: Confidential Computing with Accelerated AI Workloads on GCE

11 Oct 2025

Contributed by Lukas

The provided sources are a collection of Google Cloud documentation and blog excerpts detailing the features and implementation of **Confidential Comp...

AWS: Nitro System: Security, Enclaves, and Generative AI

11 Oct 2025

Contributed by Lukas

These sources provide an extensive overview of **AWS Nitro Enclaves**, an isolated compute environment designed to protect highly sensitive data withi...

Anthropic: Confidential Inference via Trusted Virtual Machines

11 Oct 2025

Contributed by Lukas

These sources, an announcement from Anthropic and a technical whitepaper co-authored with Pattern Labs, provide an **overview of Confidential Inferenc...

RAND: Securing AI Model Weights: Preventing Theft and Misuse

11 Oct 2025

Contributed by Lukas

The provided texts are excerpts from a **RAND Corporation research report** titled "Securing AI Model Weights: Preventing Theft and Misuse of Frontier...

Training-Free GRPO: Policy Optimization via Context Space

11 Oct 2025

Contributed by Lukas

The October 9, 2025 paper from **Tencent Youtu Lab** introduces **Training-Free Group Relative Policy Optimization (Training-Free GRPO)**, a novel met...

Multi-Agent Tool-Integrated Policy Optimization (MATPO)

11 Oct 2025

Contributed by Lukas

The October 6, 2025 paper introduces **Multi-Agent Tool-Integrated Policy Optimization (MATPO)**, a novel reinforcement learning framework designed to...

UniVideo: Unified Video Understanding, Generation, and Editing

11 Oct 2025

Contributed by Lukas

The October 9, 2025 paper details the architecture, training, and evaluation of **UniVideo**, a unified multimodal generative system capable of **hand...

Dragon Hatchling: Brain-Inspired AI Architecture

10 Oct 2025

Contributed by Lukas

This September 30, 2025 paper detail research into **Brain Dynamics Hypothesis (BDH)** models, particularly the **BDH-GPU** architecture, which propos...

AGENTFLOW: In-the-Flow Agentic System Optimization

10 Oct 2025

Contributed by Lukas

The October 7, 2025 joint collaboration between Stanford University, Texas A&M University, UC San Diego, & Lambda paper introduces **AGENTFLOW**, a no...

Less is More: Recursive Reasoning with Tiny Networks

10 Oct 2025

Contributed by Lukas

This October 6, 2025 paper from Alexia Jolicoeur-Martineau at Samsung SAIL Montréal, provides an overview and detailed comparison of two recurrent re...

Early Experience for Language Agent Improvement

10 Oct 2025

Contributed by Lukas

This October 10, 2025 joint collaboration between Meta Superintelligence Labs, FAIR at Meta, and The Ohio State University academic paper proposes and...

Petri: Accelerating AI Safety Auditing

10 Oct 2025

Contributed by Lukas

On October 6, 2925 Anthropic introduces **Petri (Parallel Exploration Tool for Risky Interactions)**, an open-source framework developed for automated...

Agentic Context Engineering: Evolving Contexts for Self-Improving LLMs

10 Oct 2025

Contributed by Lukas

The October 6, 2025 paper introduces **Agentic Context Engineering (ACE)**, a novel framework designed to enhance the performance of Large Language Mo...

CLUE: Hidden-State Clustering for Non-parametric Verification

10 Oct 2025

Contributed by Lukas

The October 2, 2025 technical report from **Tencent AI Lab** introduces **CLUE (Clustering and Experience-based Verification)**, a novel, non-parametr...

Low-Precision Transformer Failure in Flash Attention

10 Oct 2025

Contributed by Lukas

This October 5 2025 paper presents the first mechanistic explanation for a persistent **training instability** experienced when using **low-precision ...

Paris: Decentralized Open-Weight Diffusion Model

08 Oct 2025

Contributed by Lukas

The October 2025 paper introduces **Paris**, a novel open-weight diffusion model for text-to-image generation that was trained using a completely **de...

DC-VideoGen: Efficient Video Generation with Deep Compression

08 Oct 2025

Contributed by Lukas

The September 29 2025 paper introduces **DC-VideoGen**, a new post-training framework designed to significantly accelerate video diffusion models and ...

GNN101: Visual Learning of Graph Neural Networks

08 Oct 2025

Contributed by Lukas

The November 2024 paper introduces **GNN101**, an open-source, web-based interactive visualization tool designed to help non-experts learn about **Gra...

Reactive Transformer: Stateful Real-Time Language Models

08 Oct 2025

Contributed by Lukas

The October 2025 paper introduces the **Reactive Transformer (RxT)**, a novel neural network architecture designed by Adam Filipek and Reactive AI to ...

Imperceptible Jailbreaking Against Large Language Models

08 Oct 2025

Contributed by Lukas

The October 2025 academic paper introduces a novel **imperceptible jailbreaking attack** against Large Language Models (LLMs) that exploits Unicode **...

ACON: Optimizing Context Compression for LLM Agents

08 Oct 2025

Contributed by Lukas

The October 2025 papar provide an overview of **Agent Context Optimization (ACON)**, a novel framework designed to enhance the efficiency and performa...

CoDA: Collaborative Multi-Agent Data Visualization

08 Oct 2025

Contributed by Lukas

The October 2025 paper introduces **CoDA (Collaborative Data-visualization Agents)**, a novel multi-agent system designed to automate complex data vis...

RECAP: Safety Alignment via Counter-Aligned Prefilling

08 Oct 2025

Contributed by Lukas

The October 2025 academic paper introduces **RECAP (Robust Safety Alignment via Counter-Aligned Prefilling)**, a novel reinforcement learning (RL) met...

ONNX Ecosystem, Optimization, and Deployment

08 Oct 2025

Contributed by Lukas

The provided sources center on the **Open Neural Network Exchange (ONNX)** format and its inference engine, **ONNX Runtime**, highlighting their role ...

Emergent Abilities of Large Language Models

08 Oct 2025

Contributed by Lukas

The sources (October 2022, March 2025) provide an extensive examination of **emergent abilities** in large language models (LLMs), defining them as un...

Implicit Dynamics of In-Context Learning

08 Oct 2025

Contributed by Lukas

This July 2025 research paper explores **In-Context Learning (ICL)** in Large Language Models (LLMs), which is the striking ability of these models to...

Contextual Blocks: Implicit Weight Updates and Federated Learning

08 Oct 2025

Contributed by Lukas

We compare and contrast the math behind two recent research papers which we have covered individually before on this podcast:July 2025:Learning withou...

MotionRAG: Retrieval-Augmented Image-to-Video Generation

08 Oct 2025

Contributed by Lukas

The September 2025 paper introduces **MotionRAG**, a novel retrieval-augmented framework designed to enhance motion realism in image-to-video generati...

NIST Evaluation of DeepSeek AI Models

08 Oct 2025

Contributed by Lukas

The provided text is an excerpt from a **technical evaluation report** conducted by the Center for AI Standards and Innovation (CAISI), housed within ...

Test-Time Reinforcement Learning for LLMs

08 Oct 2025

Contributed by Lukas

This June 2025 paper introduces a novel methodology called **Test-Time Reinforcement Learning (TTRL)**, which enables Large Language Models (LLMs) to ...

LongCodeZip: Compress Long Code Context for LLMs

08 Oct 2025

Contributed by Lukas

The October 2025 paper introduces **LongCodeZip**, a novel, training-free, and model-agnostic framework designed for **compressing long code contexts*...

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

08 Oct 2025

Contributed by Lukas

The September 2025 paper introduces **ReasoningBank**, a novel memory framework designed to enhance Large Language Model (LLM) agents by distilling an...

Analog In-Memory Attention for Energy-Efficient LLMs

08 Oct 2025

Contributed by Lukas

Thus November 2024 paper and new analysis in September 2025 provide a comprehensive overview of a novel **Analog In-Memory Computing (AIMC)** architec...

Regression Language Models for Code Metrics

03 Oct 2025

Contributed by Lukas

This September 30 2025 academic paper, introduces Regression Language Models (RLMs) as a unified method for code-to-metric regression, which is the ta...

Introducing RTEB: Retrieval Embedding Benchmark

03 Oct 2025

Contributed by Lukas

The text introduces the **Retrieval Embedding Benchmark (RTEB)**, a new standard designed to accurately evaluate the **retrieval accuracy of embedding...

CUDA Unified Memory and Heterogeneous Memory Management

02 Oct 2025

Contributed by Lukas

The provided sources offer a comprehensive look at memory management for GPU-accelerated computing, focusing heavily on **Heterogeneous Memory Managem...

Moravec's Paradox and AI Automation Limits

01 Oct 2025

Contributed by Lukas

These two 2025 research papers collaboratively examine **Moravec's Paradox**, which posits that skills effortless for humans (like perception and mobi...

Characterizing LLM KV Cache Workloads in Production

01 Oct 2025

Contributed by Lukas

The June 2025 paper characterizes and optimizes the **Key-Value Cache (KV$)** workload patterns associated with serving large language models (LLMs) a...

BurstGPT: A Real-World LLM Serving Workload Dataset

01 Oct 2025

Contributed by Lukas

The May 2025 academic paper introduces **BurstGPT**, a novel, real-world workload dataset consisting of over ten million traces from regional Azure Op...

Qwen3-Next & Qwen3-Omni technical report

30 Sep 2025

Contributed by Lukas

These May and September 2025 technical reports introduce and evaluate two distinct but related large language models: the **Qwen3 family** and the **Q...

Variational Reasoning Framework for Language Models

29 Sep 2025

Contributed by Lukas

This September 26 2025 paper is an excerpt from a research paper introducing a variational reasoning framework designed to enhance the reasoning cap...

Federated Learning with Soft Embeddings for Retrieval

27 Sep 2025

Contributed by Lukas

This September 20 2025 paper introduce a novel, efficient architecture for training **retrieval models** used in retrieval-augmented generation (RAG) ...

Schoenfeld Theory Applied to Large Reasoning Models

27 Sep 2025

Contributed by Lukas

This September 18 2025 paper introduces a research project that applies **Schoenfeld’s Episode Theory**, a classic cognitive framework for analyzing...

CWM: Code Generation with World Models

27 Sep 2025

Contributed by Lukas

This Meta September 24 2025 paper provides an extensive overview of **Code World Model (CWM)**, a 32-billion-parameter dense decoder-only Transformer ...

EmbeddingGemma: Powerful Lightweight Text Representations

26 Sep 2025

Contributed by Lukas

The September 24 2025 paper introduces **EmbeddingGemma**, a novel, lightweight text embedding model developed by **Google DeepMind**, built upon the ...

CE-GPPO: Controlling Entropy via Gradient-Preserving Policy Optimization

26 Sep 2025

Contributed by Lukas

The September 25 2035 paper introduces a novel reinforcement learning (RL) algorithm, **Controlling Entropy via Gradient-Preserving Policy Optimizatio...

Seedream 4.0: Multimodal Image Generation System

26 Sep 2025

Contributed by Lukas

The September 24 2025 paper is a technical report from **ByteDance Seed** detailing the **Seedream 4.0** system, an advanced multimodal image generati...

Tree-based Group Policy Optimization for LLM Agents

26 Sep 2025

Contributed by Lukas

The September 25 2025 paper introduces **Tree-based Group Relative Policy Optimization (Tree-GRPO)**, a new reinforcement learning (RL) method designe...

GDPval: Measuring AI Performance on Real-World Work

26 Sep 2025

Contributed by Lukas

The September 25 2025 dated sources introduce **GDPval**, a novel benchmark created by OpenAI to evaluate the performance of **AI models** on **econom...

Adaptive Compression Techniques for Efficient LLM Inference

20 Sep 2025

Contributed by Lukas

These 14 research papers provide an overview of various **compression techniques for Large Language Models (LLMs)**, primarily focusing on **reducing ...

LLM-I: Interleaved Multimodal Creators via Tool-Use

20 Sep 2025

Contributed by Lukas

The September 2025 academic paper introduces **LLM-Interleaved (LLM-I)**, a novel, flexible framework for interleaved image-text generation that refra...

Evolving Language Models Without Labels: EVOL-RL

19 Sep 2025

Contributed by Lukas

This September 2025 paper source is a research paper from Tencent AI Lab and academic collaborators that introduces EVOL-RL, an Evolution-Oriented ...

SearchInstruct: Instruction Tuning with Dynamic Retrieval

19 Sep 2025

Contributed by Lukas

This September 2025 paper introduces SearchInstruct, a novel framework designed to enhance Supervised Fine-Tuning (SFT) of large language models (LLMs...

THOR: Hierarchical RL for Mathematical Reasoning

19 Sep 2025

Contributed by Lukas

This September 2025 paper describes THOR (Tool-Integrated Hierarchical Optimization via RL), a novel approach designed to enhance the mathematical re...

The Uneven Diffusion of AI Adoption

19 Sep 2025

Contributed by Lukas

The "Anthropic Economic Index report" documents the rapid and uneven adoption of Artificial Intelligence (AI), specifically using data from the compan...

FlowRL: Distribution Matching for LLM Reasoning

19 Sep 2025

Contributed by Lukas

This September 2025 paper introduces FlowRL, a novel reinforcement learning (RL) algorithm for large language models (LLMs) that shifts the optimizat...

Single-stream Policy Optimization for LLMs

19 Sep 2025

Contributed by Lukas

This September 2025 paper introduces Single-stream Policy Optimization (SPO), a new reinforcement learning algorithm for training Large Language Mode...

Pre-computing & reusing KV caches to accelerate RAG inference

18 Sep 2025

Contributed by Lukas

How can pre-computing and reusing Key-Value (KV) caches accelerate inference for Retrieval-Augmented Generation and other long-context LLM tasks?The p...

REFRAG: Rethinking RAG-based Decoding

18 Sep 2025

Contributed by Lukas

This September 2025 academic paper, titled "REFRAG: Rethinking RAG based Decoding," appears on the alphaXiv pre-print server. It focuses on Reframing ...

DeepSeek-R1: Reinforcing LLM Reasoning Through Self-Evolution

18 Sep 2025

Contributed by Lukas

This paper published on Nature on September 17 2025, "DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning," details the develop...

ShadowKV: High-Throughput Long-Context LLM Inference

17 Sep 2025

Contributed by Lukas

This April 2025 paper introduces ShadowKV, an innovative inference system for long-context Large Language Models (LLMs) designed to significantly e...

TailorKV: Hybrid KV Cache Compression for LLMs

17 Sep 2025

Contributed by Lukas

This May 2025 paper introduces TailorKV, a novel hybrid framework designed to optimize Key-Value (KV) cache management in large language models (LLMs)...

MIRAGE: Optimizing LLM KV Cache with Parameter Remapping

17 Sep 2025

Contributed by Lukas

This July 2025 paper discusses advanced memory optimization techniques for Large Language Models (LLMs), particularly focusing on KV cache managemen...

WebSailor-V2: Bridging Proprietary Agents with Synthetic Data and RL

17 Sep 2025

Contributed by Lukas

This September 2025 paper introduces WebSailor-V2, an open-source deep research agent developed by Alibaba Group's Tongyi Lab. The paper details a ...

Dynamic Chunking for Hierarchical Sequence Modeling

17 Sep 2025

Contributed by Lukas

This July 2025 paper introduces Hierarchical Networks (H-Nets), a novel architecture designed to move beyond traditional tokenization in large langua...

LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning

17 Sep 2025

Contributed by Lukas

This September 2025 paper introduces LoFT, a novel framework designed to improve Long-Tailed Semi-Supervised Learning (LTSSL) by leveraging paramet...

QuantAgent: Multi-Agent LLM for High-Frequency Trading

17 Sep 2025

Contributed by Lukas

This September 2025 paper describes QuantAgent, a novel multi-agent large language model (LLM) framework designed for high-frequency quantitative tra...

Infini-gram: Scaling Unbounded N-gram Language Models

17 Sep 2025

Contributed by Lukas

This April 2025 paper introduces Infini-gram, a novel engine designed to scale n-gram language models to an unprecedented 5 trillion tokens and sup...

Generalist Reward Modeling with Inference-Time Scaling

16 Sep 2025

Contributed by Lukas

This April 2025 paper introduces Self-Principled Critique Tuning (SPCT), a novel method designed to enhance the inference-time scalability of Gene...

Hierarchical Reasoning Model: Brain-Inspired AI for Complex Tasks

16 Sep 2025

Contributed by Lukas

This August 2025 paper introduces the Hierarchical Reasoning Model (HRM), a novel AI architecture inspired by the human brain's hierarchical and mult...

Native Sparse Attention: Efficient Long-Context LLMs

16 Sep 2025

Contributed by Lukas

This February 2025 paper introduces Native Sparse Attention (NSA), a novel approach to address the computational demands of long-context modeling in ...

CodeI/O: Reasoning Patterns Through Code Input-Output Prediction

16 Sep 2025

Contributed by Lukas

This February 2025 paper introduce CodeI/O, a novel training method for Large Language Models (LLMs) that enhances general reasoning abilities by t...

Janus-Pro: Unified Multimodal AI with Scaled Improvements

16 Sep 2025

Contributed by Lukas

This January 2025 paper introduces Janus-Pro, an enhanced artificial intelligence model for multimodal understanding and generation. It builds upon ...

Federated Post-Training LLMs: An Accessibility and Efficiency Survey

16 Sep 2025

Contributed by Lukas

This August 2025 paper examines the evolving landscape of Federated Large Language Models (FedLLM), focusing on how large language models are post-t...

Non-Penetrative Tensor Partitioning for Collaborative AIoT Inference

16 Sep 2025

Contributed by Lukas

This June 2025 paper introduces Non-Penetrative Tensor Partitioning (NPTP), a novel method designed to improve the speed of collaborative inference fo...

Collaborative Edge Inference with Dynamic Task Offloading and Early Exiting

16 Sep 2025

Contributed by Lukas

This December 2024 paper introduces a collaborative inference framework designed for large-scale models in 5G smart city edge computing environmen...

Adaptive LLM Partitioning for Edge Inference

16 Sep 2025

Contributed by Lukas

This May 2025 paper introduces a resource-aware algorithm designed to optimize the performance of Large Language Models (LLMs) for low-latency inferen...

UQ: Unsolved Questions for Language Models

16 Sep 2025

Contributed by Lukas

This August 2025 paper introduces UQ, a novel evaluation framework designed to challenge large language models (LLMs) with complex, unsolved questions...

PETALS: Collaborative Large Language Model Inference and Fine-tuning

16 Sep 2025

Contributed by Lukas

This March 2023 paper introduces PETALS, a novel system designed to facilitate the collaborative inference and fine-tuning of large language models ...

AWQ: On-Device LLM Compression and Acceleration

15 Sep 2025

Contributed by Lukas

This July 2024 paper introduces Activation-aware Weight Quantization (AWQ), a novel method for compressing Large Language Models (LLMs) by quantizing ...

HybridServe: Efficient LLM Inference with Hybrid Caching

15 Sep 2025

Contributed by Lukas

This January 2025 paper introduces HybridServe, an LLM inference system designed to enhance throughput and cost-effectiveness for large language m...

FlexGen: High-Throughput LLM Inference on a Single GPU

15 Sep 2025

Contributed by Lukas

This June 2023 paper introduces FlexGen, a novel high-throughput generation engine designed to overcome the substantial computational and memory deman...

GraphSAGE: Inductive Representation Learning on Large Graphs

15 Sep 2025

Contributed by Lukas

This September 2018 paper introduces GraphSAGE, a novel inductive framework designed to generate node embeddings for large, evolving graphs, addres...

MetaGraph: knowledge graphs from financial NLP

15 Sep 2025

Contributed by Lukas

This September 2025 paper presents MetaGraph, a novel methodology for constructing knowledge graphs from scientific literature, specifically applie...

Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Model

15 Sep 2025

Contributed by Lukas

This August 2025 paper explores the critical area of fact-checking and factuality evaluation in Large Language Models (LLMs). It systematically analy...

The Illusion of Diminishing Returns in LLM Execution

15 Sep 2025

Contributed by Lukas

This September 2025 paper explores the concept of long-horizon execution in Large Language Models (LLMs), arguing that marginal gains in single-step...

PyTorch FSDP: Scaling Fully Sharded Data Parallel

15 Sep 2025

Contributed by Lukas

This September 2023 paper introduces PyTorch Fully Sharded Data Parallel (FSDP), an advanced solution designed to scale the training of exceptionall...

Llama 3: Architecture, Capabilities, and Safety

14 Sep 2025

Contributed by Lukas

On this November 2025 paper the Meta Llama Team's paper introduces Llama 3, a new family of large language models featuring 8B, 70B, and 405B paramete...

Graph Patterns of Knowledge in Large Language Models

14 Sep 2025

Contributed by Lukas

This May 2025 paper explores the structural patterns of knowledge within Large Language Models (LLMs) by adopting a graph-based perspective. The autho...

All for One: LLMs Solve Mental Math at the Last Token

13 Sep 2025

Contributed by Lukas

This September 2025 published research investigates how large language models (LLMs) perform mental math, particularly focusing on the flow of inform...

Survey of Reinforcement Learning for Large Reasoning Models

13 Sep 2025

Contributed by Lukas

This September 2025 paper provides a comprehensive overview of Reinforcement Learning (RL) as applied to Large Reasoning Models (LRMs). It breaks d...

SpikingBrain: Brain-Inspired LLMs for Efficient Long-Context Processing

13 Sep 2025

Contributed by Lukas

These September 2025 papers present a technical report on SpikingBrain, a novel family of large language models (LLMs) that draw inspiration from brai...

Statistical Methods for Generative AI Reliability

13 Sep 2025

Contributed by Lukas

This September 2025 paper explores the critical role of statistical methods in enhancing the reliability and functionality of Generative AI (GenAI), w...

EntiGraph: Scaling Language Models with Synthetic Pretraining

13 Sep 2025

Contributed by Lukas

This October 2024 paper introduces synthetic continued pretraining (synthetic CPT), a novel method designed to enhance language model knowledge acqu...

NOVELTYBENCH: Evaluating Language Model Diversity

12 Sep 2025

Contributed by Lukas

This August 2025 paper introduces NOVELTYBENCH, a new benchmark designed to evaluate how well large language models (LLMs) generate diverse and high...

HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization

12 Sep 2025

Contributed by Lukas

This April 2025 paper introduces HyperController, a novel and computationally efficient algorithm designed to optimize hyperparameters during the tra...

Activity Overview

Episodes

HBF: High Bandwidth Flash for AI Inferencing

Architectural Migration to Multi-head Latent Attention

COPA: Composable On-Package GPU Architecture for Domain Specialization

Performance of Confidential Computing for Large Language Models

Google: Confidential Computing with Accelerated AI Workloads on GCE

AWS: Nitro System: Security, Enclaves, and Generative AI

Anthropic: Confidential Inference via Trusted Virtual Machines

RAND: Securing AI Model Weights: Preventing Theft and Misuse

Training-Free GRPO: Policy Optimization via Context Space

Multi-Agent Tool-Integrated Policy Optimization (MATPO)

UniVideo: Unified Video Understanding, Generation, and Editing

Dragon Hatchling: Brain-Inspired AI Architecture

AGENTFLOW: In-the-Flow Agentic System Optimization

Less is More: Recursive Reasoning with Tiny Networks

Early Experience for Language Agent Improvement

Petri: Accelerating AI Safety Auditing

Agentic Context Engineering: Evolving Contexts for Self-Improving LLMs

CLUE: Hidden-State Clustering for Non-parametric Verification

Low-Precision Transformer Failure in Flash Attention

Paris: Decentralized Open-Weight Diffusion Model

DC-VideoGen: Efficient Video Generation with Deep Compression

GNN101: Visual Learning of Graph Neural Networks

Reactive Transformer: Stateful Real-Time Language Models

Imperceptible Jailbreaking Against Large Language Models

ACON: Optimizing Context Compression for LLM Agents

CoDA: Collaborative Multi-Agent Data Visualization

RECAP: Safety Alignment via Counter-Aligned Prefilling

ONNX Ecosystem, Optimization, and Deployment

Emergent Abilities of Large Language Models

Implicit Dynamics of In-Context Learning

Contextual Blocks: Implicit Weight Updates and Federated Learning

MotionRAG: Retrieval-Augmented Image-to-Video Generation

NIST Evaluation of DeepSeek AI Models

Test-Time Reinforcement Learning for LLMs

LongCodeZip: Compress Long Code Context for LLMs

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Analog In-Memory Attention for Energy-Efficient LLMs

Regression Language Models for Code Metrics

Introducing RTEB: Retrieval Embedding Benchmark

CUDA Unified Memory and Heterogeneous Memory Management

Moravec's Paradox and AI Automation Limits

Characterizing LLM KV Cache Workloads in Production

BurstGPT: A Real-World LLM Serving Workload Dataset

Qwen3-Next & Qwen3-Omni technical report

Variational Reasoning Framework for Language Models

Federated Learning with Soft Embeddings for Retrieval

Schoenfeld Theory Applied to Large Reasoning Models

CWM: Code Generation with World Models

EmbeddingGemma: Powerful Lightweight Text Representations

CE-GPPO: Controlling Entropy via Gradient-Preserving Policy Optimization

Seedream 4.0: Multimodal Image Generation System

Tree-based Group Policy Optimization for LLM Agents

GDPval: Measuring AI Performance on Real-World Work

Adaptive Compression Techniques for Efficient LLM Inference

LLM-I: Interleaved Multimodal Creators via Tool-Use

Evolving Language Models Without Labels: EVOL-RL

SearchInstruct: Instruction Tuning with Dynamic Retrieval

THOR: Hierarchical RL for Mathematical Reasoning

The Uneven Diffusion of AI Adoption

FlowRL: Distribution Matching for LLM Reasoning

Single-stream Policy Optimization for LLMs

Pre-computing & reusing KV caches to accelerate RAG inference

REFRAG: Rethinking RAG-based Decoding

DeepSeek-R1: Reinforcing LLM Reasoning Through Self-Evolution

ShadowKV: High-Throughput Long-Context LLM Inference

TailorKV: Hybrid KV Cache Compression for LLMs

MIRAGE: Optimizing LLM KV Cache with Parameter Remapping

WebSailor-V2: Bridging Proprietary Agents with Synthetic Data and RL

Dynamic Chunking for Hierarchical Sequence Modeling

LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning

QuantAgent: Multi-Agent LLM for High-Frequency Trading

Infini-gram: Scaling Unbounded N-gram Language Models

Generalist Reward Modeling with Inference-Time Scaling

Hierarchical Reasoning Model: Brain-Inspired AI for Complex Tasks

Native Sparse Attention: Efficient Long-Context LLMs

CodeI/O: Reasoning Patterns Through Code Input-Output Prediction

Janus-Pro: Unified Multimodal AI with Scaled Improvements