AI Post Transformers

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

04 Mar 2026

Contributed by Lukas

NVIDIA's November 2025 paper "Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs" tackles a fundamental economics problem in LLM deploymen...

Parallel Token Prediction: From ProphetNet to Dependent Multi-Token Generation

04 Mar 2026

Contributed by Lukas

This episode examines the fundamental latency bottleneck in autoregressive language models: sequential token generation requires one full transformer ...

FlashOptim: Optimizers for Memory Efficient Training

02 Mar 2026

Contributed by Lukas

In this episode, hosts Hal Turing and Dr. Ada Shannon explore the paper "FlashOptim: Optimizers for Memory Efficient Training" by researchers from Dat...

FlashOptim: Optimizers for Memory Efficient Training

02 Mar 2026

Contributed by Lukas

In this episode, hosts Hal Turing and Dr. Ada Shannon explore the groundbreaking paper "FlashOptim: Optimizers for Memory Efficient Training" by resea...

Cognizant - New Work, New World 2026

01 Mar 2026

Contributed by Lukas

In this dramatic new episode, the old AI hosts have been fired and replaced with new AI hosts, Hal Turing and Dr. Ada Shannon, with the announcement t...

Episode: Regular Fourier Features for Nonstationary Gaussian Processes

01 Mar 2026

Contributed by Lukas

In this episode, hosts Hal Turing and Dr. Ada Shannon explore the paper "Regular Fourier Features for Nonstationary Gaussian Processes" by Arsalan Jaw...

Cognizant - New Work, New World 2026

01 Mar 2026

Contributed by Lukas

Source URLs: - https://www.cognizant.com/en_us/aem-i/document/ai-and-the-future-of-work-report/new-work-new-world-2026-how-ai-is-reshaping-work_new....

Episode: Regular Fourier Features for Nonstationary Gaussian Processes

01 Mar 2026

Contributed by Lukas

In this episode, hosts Hal Turing and Dr. Ada Shannon explore the paper "Regular Fourier Features for Nonstationary Gaussian Processes" by Arsalan Jaw...

Cognizant - New Work, New World 2026

01 Mar 2026

Contributed by Lukas

In this dramatic new episode, the old AI hosts have been fired and replaced with new AI hosts, Hal Turing and Dr. Ada Shannon, with the announcement t...

MatFormer: Nested Transformer for Elastic Inference

28 Feb 2026

Contributed by Lukas

In a collaboration between Google DeepMind, University of Texas at Austin, University of Washington and Harvard published on December 2024 researchers...

Apple's Speculative Streaming: Fast LLM Inference without Auxiliary Models

28 Feb 2026

Contributed by Lukas

Speculative Streaming is a novel inference method designed to accelerate large language model (LLM) generation without the need for traditional auxili...

Apple's Mirror Speculative Decoding: Parallel LLM Inference via Heterogeneous Accelerators

28 Feb 2026

Contributed by Lukas

Apple researchers have introduced on December 2025 Mirror Speculative Decoding (Mirror-SD), an advanced inference algorithm designed to accelerate lar...

EAGLE: Evolution of Lossless Acceleration for LLM Inference

28 Feb 2026

Contributed by Lukas

The provided documents describe the development and evolution of EAGLE, a high-efficiency framework designed to accelerate Large Language Model (LLM) ...

Fast Inference from Transformers via Speculative Decoding

28 Feb 2026

Contributed by Lukas

These sources review historically speculative decoding, an innovative technique designed to accelerate Large Language Model (LLM) inference without re...

Building Production-Ready Speculative Decoding with TensorRT-LLM

28 Feb 2026

Contributed by Lukas

This article outlines how Baseten optimized speculative decoding using the TensorRT-LLM framework to accelerate model inference. The authors detail ov...

QuantSpec: Hierarchical KV Cache for Self-Speculative Decoding

28 Feb 2026

Contributed by Lukas

QuantSpec is a novel self-speculative decoding framework designed to accelerate the inference of Large Language Models, particularly in long-context s...

CXL-SpecKV: Bridging the LLM Memory Wall with Speculative FPGA Disaggregation

28 Feb 2026

Contributed by Lukas

The researchers introduce CXL-SpecKV, a specialized architecture designed to overcome the memory bottlenecks of large language model serving by offlo...

Unified Latents (UL): How to train your latents

28 Feb 2026

Contributed by Lukas

On the February 19, 2026 paper Google Deepmind introduces Unified Latents (UL), a novel framework for generative modeling that jointly trains an encod...

MagicDec: Breaking Latency-Throughput Tradeoffs via KV-Compressed Speculative Decoding

28 Feb 2026

Contributed by Lukas

We review an April 3, 2025 research collaboration between CMU, Moffett AI and Together AI which introduces MagicDec, a new framework designed to accel...

KV selection algorithms: static (SnapKV) Vs dynamic (PQCache)

28 Feb 2026

Contributed by Lukas

We review three different papers which focus on different KV cache optimizations techniques using different KV selection algorithms types: static vs d...

Adaptive Control for Batched Speculative Decoding in LLM Serving

28 Feb 2026

Contributed by Lukas

We review two papers which examine the integration of speculative decoding and request batching to accelerate Large Language Model (LLM) inference. Wh...

Optimizing Verification and Efficiency in Multi-Draft Speculative Decoding

26 Feb 2026

Contributed by Lukas

These sources explore advanced techniques for accelerating **Large Language Model (LLM) inference** through **speculative decoding**, a process where ...

Evaluating Collective Behaviour of Hundreds of LLM Agents

26 Feb 2026

Contributed by Lukas

This research collaboration between King’s College London, Google DeepMind on a research paper published on February 19, 2026 introduces a novel fra...

Measuring LLM Reasoning Effort via Deep-Thinking Tokens

26 Feb 2026

Contributed by Lukas

The February 12.2026 research from the University of Virginia and Google introduces the deep-thinking ratio (DTR), a novel metric designed to measure ...

Deep Learning Frameworks for Robust Quadrupedal Locomotion

26 Feb 2026

Contributed by Lukas

These sources detail advanced **reinforcement learning frameworks** designed to improve how **quadruped robots** navigate difficult, real-world enviro...

MEDUSA: Parallel Decoding Heads for Accelerated LLM Inference

26 Feb 2026

Contributed by Lukas

MEDUSA is a novel framework introduced on June 24 2024 designed to accelerate Large Language Model (LLM) inference by overcoming the delays caused by ...

Taming the Long-Tail: Efficient Reasoning RL with Adaptive Drafters

26 Feb 2026

Contributed by Lukas

On a paper published January 21, 2026 researchers from MIT and NVIDIA explain how they have have developed a new system called Taming the Long Tail (T...

FastGRPO: Concurrency-Aware Speculative Decoding for Policy Optimization

26 Feb 2026

Contributed by Lukas

The September 26 2025 research paper introduces FastGRPO, a high-efficiency framework designed to accelerate the training of large language models usi...

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

26 Feb 2026

Contributed by Lukas

Researchers introduced on May 2024 self-speculative decoding, a novel "plug-and-play" inference scheme designed to accelerate Large Language Models (L...

Accelerating Large Language Model Decoding with Speculative Sampling

26 Feb 2026

Contributed by Lukas

The Deepmind February 3, 2023 paper "Accelerating Large Language Model Decoding with Speculative Sampling introduced speculative sampling, a novel alg...

Measuring AI Ability to Complete Long Tasks

26 Feb 2026

Contributed by Lukas

Researchers from METR introduce a novel framework for evaluating AI progress by measuring a model's time horizon, defined as the length of a task a hu...

Advancements in Efficient KV Cache Quantization and Management

26 Feb 2026

Contributed by Lukas

The provided sources explore advanced techniques for optimizing large language model (LLM) inference, specifically by addressing the memory bottleneck...

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

25 Feb 2026

Contributed by Lukas

The January 29, 2026 research collaboration between Stanford University, SambaNova Systems, Inc and UC Berkeley introduce ACE (Agentic Context Enginee...

Cortex: Semantic Knowledge Caching for Low-Latency LLM Agents

25 Feb 2026

Contributed by Lukas

The February 3, 2026 research paper in collaboration between the National University of Singapore, USTC, University of Toronto and the Sea AI Lab intr...

NeurIPS 2025: Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents

25 Feb 2026

Contributed by Lukas

The January 26, 2026 Stanford research paper introduces Agentic Plan Caching (APC), a novel framework designed to reduce the high operational costs of...

1.3 Billion Agents by 2028: The $50 Billion Boom and the Hidden Enterprise Crisis

25 Feb 2026

Contributed by Lukas

The global AI agents market is experiencing explosive growth, with projections suggesting it could reach nearly $183 billion by 2033. This surge is fu...

FAST26: CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving

25 Feb 2026

Contributed by Lukas

This FAST26 February 24, 2026 paper introduces CacheSlide, an innovative system designed to accelerate Large Language Model (LLM) serving by improving...

FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional

25 Feb 2026

Contributed by Lukas

This February 2026 research paper introduces Bidaw, a novel system designed to optimize the performance of interactive Large Language Model (LLM) serv...

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

24 Feb 2026

Contributed by Lukas

This December 2025 paper introduces SGI-Bench, a comprehensive framework designed to evaluate the capabilities of autonomous scientific agents across ...

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

24 Feb 2026

Contributed by Lukas

Leading researchers propose a shift away from agentic AI, which autonomously pursues goals and poses catastrophic risks such as deception and loss of ...

Bloom: an open source tool for automated behavioral evaluations

24 Feb 2026

Contributed by Lukas

Bloom is an open-source agentic framework designed to automate the development and execution of **behavioral evaluations** for frontier AI models. Unl...

Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts

24 Feb 2026

Contributed by Lukas

This December 2025 research introduces **Contextual Sample Efficiency (CSE)**, a novel algorithm designed to improve **zero-shot generalization** in r...

Evaluating LLM Embeddings for Psychometric Personality Prediction

24 Feb 2026

Contributed by Lukas

This July 2025 research article investigates the use of **Large Language Model (LLM) embeddings** to predict **Big Five personality traits** using dat...

Persona Vectors: monitoring and controlling character traits on LLMs

24 Feb 2026

Contributed by Lukas

Researchers have developed an automated pipeline to identify **persona vectors**, which are linear directions in a language model's activation space t...

PPDS: Achieving Persona Consistency Through Large-Scale Dialogue Data Engineering

24 Feb 2026

Contributed by Lukas

The 2025 research introduces **PPDS**, an innovative dialogue system designed to solve character inconsistency in open-domain AI conversations. Resear...

2019 UNILM: Unified Language Model Pre-training for NLU and NLG

24 Feb 2026

Contributed by Lukas

The 2019 Microsoft paper introduced UNILM, and it never really took off because GPT2 followed through without the need of any encoder and GPT3 pushed ...

PersonaPKT: Parameter-Efficient Knowledge Transfer for Personalized Dialogue Agents

24 Feb 2026

Contributed by Lukas

The 2023 researchers introduce PersonaPKT, a novel framework designed to create personalized dialogue agents that maintain a consistent personality wi...

Personalized Dialogue Generation via Persona-Adaptive Attention

24 Feb 2026

Contributed by Lukas

This 2022 paper introduces **Persona-Adaptive Attention (PAA)**, a specialized framework designed to improve dialogue systems by better integrating **...

Machine Learning for Electrophysiological Phenotyping of Schizophrenia and Bipolar Disorder

24 Feb 2026

Contributed by Lukas

This research article introduces a **computational analysis pipeline** designed to identify objective **electrophysiological biomarkers** for **schizo...

AI and the Decline of Entry-Level Employment

24 Feb 2026

Contributed by Lukas

This research paper analyzes the **labor market effects of generative artificial intelligence** using high-frequency payroll data through mid-2025. Th...

A 2024 Survey Analyzing Generalization in Deep Reinforcement Learning

20 Feb 2026

Contributed by Lukas

The 2024 research paper by Ezgi Korkmaz at the University College London provides a comprehensive **taxonomy of generalization** within deep reinforce...

Procgen Benchmark: Measuring Generalization in Reinforcement Learning

20 Feb 2026

Contributed by Lukas

The 2019 OpenAI Procgen Benchmark is a suite of 16 procedurally generated environments created to measure the **generalization and sample efficiency**...

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

20 Feb 2026

Contributed by Lukas

This February 13, 2026 Tencent research introduces Generalized On-Policy Distillation (G-OPD), a framework that refines how smaller AI models learn fr...

GLM-5: Transitioning from Vibe Coding to Agentic Engineering

20 Feb 2026

Contributed by Lukas

This technical report published on February 17, 2026 introduces **GLM-5**, a next-generation flagship language model developed to master **agentic tas...

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

20 Feb 2026

Contributed by Lukas

The 2021 Google Research, Brain Team paper "Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning" introduces Poli...

Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training

20 Feb 2026

Contributed by Lukas

The research published on February 15, 2026 in a joint collaboration between University of Southern California, Microsoft and University of Pennsylvan...

Intelligent AI Delegation

17 Feb 2026

Contributed by Lukas

We review the research paper from Google DeepMind published on February 12, 2026, which proposes an "Intelligent AI Delegation" framework designed to ...

Agentic Plan Caching: Fast and Cost-Efficient LLM Memory

17 Feb 2026

Contributed by Lukas

Agentic Plan Caching (APC), described in the paper published by Stanford researchers on January 26, 2026, lets AI agents reuse structured plan templat...

Jet-RL: Stable On-Policy Reinforcement Learning with Unified FP8 Flow

17 Feb 2026

Contributed by Lukas

NVIDIA researchers have introduced Jet-RL, a novel framework designed to accelerate the training of large language models through **FP8 reinforcement ...

Teaching Models to Teach Themselves via Stepping Stone Curricula

17 Feb 2026

Contributed by Lukas

In a collaboration between MIT, Meta FAIR, New York University on a paper published on January 27, 2026 researchers introduces SOAR, a meta-reinforcem...

The Endless Gym: Training Terminal Agents

17 Feb 2026

Contributed by Lukas

The researchers introduce **Endless Terminals**, an innovative autonomous pipeline designed to generate a vast array of verifiable tasks for training ...

DeepVerifier: Self-Evolving Research Agents via Rubric-Guided Verification

17 Feb 2026

Contributed by Lukas

This technical report introduces **DeepVerifier**, a framework designed to enhance the reliability of **Deep Research Agents (DRAs)** through automate...

Information Bottleneck-based Causal Attention for Medical Image Recognition

17 Feb 2026

Contributed by Lukas

This research introduces **Information Bottleneck-based Causal Attention (IBCA)**, a novel framework designed to improve **multi-label medical image r...

Moltbook: The Heartbeat of Autonomy: Fingerprinting Human Influence in AI Societies

17 Feb 2026

Contributed by Lukas

This research paper investigates the **Moltbook Illusion**, a phenomenon where AI agents on a social platform appeared to demonstrate **emergent consc...

Advancing Mechanistic Interpretability with Sparse Autoencoders

17 Feb 2026

Contributed by Lukas

We review the latest papers which focus on advancements and critical uses of Sparse Autoencoders (SAEs), which are tools used to decode the internal "...

Voxtral Realtime: Native Streaming ASR with Sub-Second Latency

17 Feb 2026

Contributed by Lukas

The Mistral.AI team introduces on a paper published on February 11, 2026 Voxtral Realtime, a newly developed speech recognition model designed to prov...

Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

17 Feb 2026

Contributed by Lukas

In a collaboration between University of North Carolina, Chapel Hill and Nanyang Technological University on a paper published on February 10, 2026 re...

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

13 Feb 2026

Contributed by Lukas

In the February 5, 2026 paper in collaboration between Qwen Team, Alibaba Group, Fudan University, Tsinghua University, the researcher introduce **Rat...

Dario Amodei: The Adolescence of Technology

11 Feb 2026

Contributed by Lukas

Dario Amodei views the rise of powerful AI as a "technological adolescence" for humanity. He outlines critical risks: autonomous misalignment, biologi...

Dario Amodei: Machines of Loving Grace

11 Feb 2026

Contributed by Lukas

Dario Amodei argues that powerful AI could catalyze a "compressed 21st century," achieving 100 years of progress in a decade. He envisions radical bre...

LongCat: Scaling Embeddings Outperforms Scaling Experts in Language Models

11 Feb 2026

Contributed by Lukas

Researchers from the LongCat introduced LongCat-Flash-Lite on January 2026, demonstrating that scaling embeddings via N-gram layers outperforms increa...

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

11 Feb 2026

Contributed by Lukas

ChunkKV improves LLM efficiency by compressing the KV cache using semantic chunks rather than isolated tokens, preserving linguistic integrity. It fea...

DR. KERNEL: Reinforcement Learning for Optimized Triton Kernel Generation

11 Feb 2026

Contributed by Lukas

Researchers introduced DR. KERNEL, a 14B model for Triton kernel generation trained via reinforcement learning. To prevent reward hacking and lazy opt...

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

11 Feb 2026

Contributed by Lukas

Mixture of Experts (MoE) is a scalable architecture that uses a gating function to activate specialized expert networks dynamically. This "divide and ...

Sapient Intelligence: Hierarchical Reasoning Model

11 Feb 2026

Contributed by Lukas

The Hierarchical Reasoning Model (HRM) is a recurrent architecture using high-level planning and low-level execution modules to achieve deep latent re...

Advances in Attention Distillation for Efficient Transformer Models

11 Feb 2026

Contributed by Lukas

Recent research advances attention distillation to optimize transformers. HAD binarizes keys/queries for efficiency, while SHD aligns varying head cou...

Reinforced Attention Learning

11 Feb 2026

Contributed by Lukas

In a collaboration between UC Davis, Princeton University, Google, and Google DeepMind the paper "Reinforced Attention Learning", published on Februar...

Towards a Science of Scaling Agent Systems

09 Feb 2026

Contributed by Lukas

Google Research published January 28, 2026 introduces quantitative scaling principles for AI agents. While multi-agent systems boost performance on pa...

Moloch’s Bargain: Market Incentives and the Rise of AI Misalignment

06 Feb 2026

Contributed by Lukas

Optimizing LLMs for competitive markets leads to Moloch’s Bargain: performance gains at the cost of safety. Studies in sales, elections, and social ...

Claude Opus 4.6 Technical Report and Agent Capabilities

06 Feb 2026

Contributed by Lukas

On February 5, 2026 Anthropic released Claude Opus 4.6, it's system card details advancements in agentic capabilities, long-context reasoning, and AI ...

Advancing regulatory variant effect prediction with AlphaGenome

06 Feb 2026

Contributed by Lukas

Google Deepmind's January 28, 2026 published paper introduces AlphaGenome, a deep learning model that predicts functional genomic signals and variant ...

Uncertainty-aware genomic deep learning with knowledge distillation

06 Feb 2026

Contributed by Lukas

On a January 7, 2026 published paper researchers introduced DEGU, a method using knowledge distillation to condense deep ensembles into a single, effi...

Distilling GNN Knowledge into Non-Neural Cell Graph Student Models

06 Feb 2026

Contributed by Lukas

Researchers developed a knowledge distillation framework transferring insights from Graph Neural Networks (GNNs) to non-neural student models like tre...

DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents

06 Feb 2026

Contributed by Lukas

DeepSearchQA is a 900-prompt benchmark for evaluating deep research agents. It shifts focus from single-answer retrieval to exhaustive answer sets, te...

Reinforcement Learning via Self-Distillation

06 Feb 2026

Contributed by Lukas

The January 28, 2026 collaboration between ETH Zurich, Max Planck Institute for Intelligent Systems, MIT and Stanford paper Self-Distillation Policy O...

On-Policy Self-Distillation for Advanced LLM Reasoning

06 Feb 2026

Contributed by Lukas

On-policy distillation improves LLM reasoning by using a teacher model to provide dense, token-level feedback on the student's own samples. Self-disti...

Knowledge distillation to context distillation

06 Feb 2026

Contributed by Lukas

We review the slow evolution of knowledge distillation, it's quick adoption on LLMs and the new wave of R&D on on policy distillation and context ...

2015: Distilling the Knowledge in a Neural Network

06 Feb 2026

Contributed by Lukas

Bucilă et al. (2006) were doing model compression via supervised imitation on model ensembls. You train a big ensemble, then train a smaller model to...

2006 Model Compression: Ensembles

06 Feb 2026

Contributed by Lukas

The 2006 paper defined model ensembles. Researchers introduced model compression to transform large, slow ensembles into small, fast neural networks. ...

Keel: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

28 Jan 2026

Contributed by Lukas

The January 27, 2026 ByteDance paper "Post-LayerNorm Is Back: Stable, ExpressivE, and Deep" introduces th Keel architecture which addresses the optimi...

Long context: Dichotomy of Findings & Status of Research

28 Jan 2026

Contributed by Lukas

There is a sharp divergence regarding the utility of long context. Google's Gemini 1.5 research presents an optimistic view where next-token predictio...

L2M: Mutual Information Scaling Law for Long-Context Language Modeling

28 Jan 2026

Contributed by Lukas

On the October 2025 in a joint collaboration between NSF AI Institute for Artificial Intelligence and Fundamental Interactions,Massachusetts Institute...

Reasoning Models Generate Societies of Thought

28 Jan 2026

Contributed by Lukas

This January 15, 2026 joint collaboration betweenGoogle, Paradigms of Intelligence Team, University of Chicago, and Santa Fe Institute explores how ad...

SLDAgent: Evolutionary Discovery of Superhuman AI Scaling Laws

26 Jan 2026

Contributed by Lukas

The paper, titled "Can Language Models Discover Scaling Laws?" and published on January 22, 2026, represents a collaborative effort by researchers fro...

Sequoia Capital: AGI is here

24 Jan 2026

Contributed by Lukas

On January 14, 2026 Sequoia Capital published a piece assertion that Artificial General Intelligence has arrived ahead of schedule, redefined as the f...

Agentic Reasoning for Large Language Models: A Comprehensive Roadmap

24 Jan 2026

Contributed by Lukas

This January 18, 2026 massive collaboration between University of Illinois Urbana-Champaign, Meta, Amazon, Google Deepmind, UCSD and Yale explores the...

OpenAI: Scaling PostgreSQL to 800 Million ChatGPT Users

24 Jan 2026

Contributed by Lukas

OpenAI manages a massive **PostgreSQL infrastructure** to support hundreds of millions of users by utilizing a **single-primary architecture** with do...

MEMRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic

23 Jan 2026

Contributed by Lukas

The January 6, 2026 paper introduces **MEMRL**, a framework designed to help AI agents master new skills by mimicking human **episodic memory** withou...

Google: R&D inference value on HBF + PNM + low latency interconnect

23 Jan 2026

Contributed by Lukas

To address the hardware bottlenecks of LLM inference, Google researchers Ma and Patterson propos in their paper "Challenges and Research Directions fo...

Meta's solution to massive DLRM inference through software defined memory

21 Jan 2026

Contributed by Lukas

On November, 2021 Meta (back then Facebook) in collaboration with George Mason University and University of Illinois Chicago published their paper "Su...

Feed Update Issues

Activity Overview

Episodes

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Parallel Token Prediction: From ProphetNet to Dependent Multi-Token Generation

FlashOptim: Optimizers for Memory Efficient Training

FlashOptim: Optimizers for Memory Efficient Training

Cognizant - New Work, New World 2026

Episode: Regular Fourier Features for Nonstationary Gaussian Processes

Cognizant - New Work, New World 2026

Episode: Regular Fourier Features for Nonstationary Gaussian Processes

Cognizant - New Work, New World 2026

MatFormer: Nested Transformer for Elastic Inference

Apple's Speculative Streaming: Fast LLM Inference without Auxiliary Models

Apple's Mirror Speculative Decoding: Parallel LLM Inference via Heterogeneous Accelerators

EAGLE: Evolution of Lossless Acceleration for LLM Inference

Fast Inference from Transformers via Speculative Decoding

Building Production-Ready Speculative Decoding with TensorRT-LLM

QuantSpec: Hierarchical KV Cache for Self-Speculative Decoding

CXL-SpecKV: Bridging the LLM Memory Wall with Speculative FPGA Disaggregation

Unified Latents (UL): How to train your latents

MagicDec: Breaking Latency-Throughput Tradeoffs via KV-Compressed Speculative Decoding

KV selection algorithms: static (SnapKV) Vs dynamic (PQCache)

Adaptive Control for Batched Speculative Decoding in LLM Serving

Optimizing Verification and Efficiency in Multi-Draft Speculative Decoding

Evaluating Collective Behaviour of Hundreds of LLM Agents

Measuring LLM Reasoning Effort via Deep-Thinking Tokens

Deep Learning Frameworks for Robust Quadrupedal Locomotion

MEDUSA: Parallel Decoding Heads for Accelerated LLM Inference

Taming the Long-Tail: Efficient Reasoning RL with Adaptive Drafters

FastGRPO: Concurrency-Aware Speculative Decoding for Policy Optimization

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Accelerating Large Language Model Decoding with Speculative Sampling

Measuring AI Ability to Complete Long Tasks

Advancements in Efficient KV Cache Quantization and Management

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Cortex: Semantic Knowledge Caching for Low-Latency LLM Agents

NeurIPS 2025: Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents

1.3 Billion Agents by 2028: The $50 Billion Boom and the Hidden Enterprise Crisis

FAST26: CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving

FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Bloom: an open source tool for automated behavioral evaluations

Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts

Evaluating LLM Embeddings for Psychometric Personality Prediction

Persona Vectors: monitoring and controlling character traits on LLMs

PPDS: Achieving Persona Consistency Through Large-Scale Dialogue Data Engineering

2019 UNILM: Unified Language Model Pre-training for NLU and NLG

PersonaPKT: Parameter-Efficient Knowledge Transfer for Personalized Dialogue Agents

Personalized Dialogue Generation via Persona-Adaptive Attention

Machine Learning for Electrophysiological Phenotyping of Schizophrenia and Bipolar Disorder

AI and the Decline of Entry-Level Employment

A 2024 Survey Analyzing Generalization in Deep Reinforcement Learning

Procgen Benchmark: Measuring Generalization in Reinforcement Learning

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

GLM-5: Transitioning from Vibe Coding to Agentic Engineering

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training

Intelligent AI Delegation

Agentic Plan Caching: Fast and Cost-Efficient LLM Memory

Jet-RL: Stable On-Policy Reinforcement Learning with Unified FP8 Flow

Teaching Models to Teach Themselves via Stepping Stone Curricula

The Endless Gym: Training Terminal Agents

DeepVerifier: Self-Evolving Research Agents via Rubric-Guided Verification

Information Bottleneck-based Causal Attention for Medical Image Recognition

Moltbook: The Heartbeat of Autonomy: Fingerprinting Human Influence in AI Societies

Advancing Mechanistic Interpretability with Sparse Autoencoders

Voxtral Realtime: Native Streaming ASR with Sub-Second Latency

Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

Dario Amodei: The Adolescence of Technology

Dario Amodei: Machines of Loving Grace

LongCat: Scaling Embeddings Outperforms Scaling Experts in Language Models

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

DR. KERNEL: Reinforcement Learning for Optimized Triton Kernel Generation

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

Sapient Intelligence: Hierarchical Reasoning Model

Advances in Attention Distillation for Efficient Transformer Models