ARC Is a Vision Problem!
09 Dec 2025
Contributed by Lukas
In this episode, we discuss ARC Is a Vision Problem! by Keya Hu, Ali Cy, Linlu Qiu, Xiaoman Delores ...
Solving a Million-Step LLM Task with Zero Errors
09 Dec 2025
Contributed by Lukas
In this episode, we discuss Solving a Million-Step LLM Task with Zero Errors by Elliot Meyerson, Giu...
DataRater: Meta-Learned Dataset Curation
05 Dec 2025
Contributed by Lukas
In this episode, we discuss DataRater: Meta-Learned Dataset Curation by Dan A. Calian, Gregory Farqu...
Mathematical exploration and discovery at scale
15 Nov 2025
Contributed by Lukas
In this episode, we discuss Mathematical exploration and discovery at scale by Bogdan Georgiev, Javi...
Kosmos: An AI Scientist for Autonomous Discovery
12 Nov 2025
Contributed by Lukas
In this episode, we discuss Kosmos: An AI Scientist for Autonomous Discovery by Ludovico Mitchener, ...
World Simulation with Video Foundation Models for Physical AI
08 Nov 2025
Contributed by Lukas
In this episode, we discuss World Simulation with Video Foundation Models for Physical AI by NVIDIA,...
Towards Robust Mathematical Reasoning
06 Nov 2025
Contributed by Lukas
In this episode, we discuss Towards Robust Mathematical Reasoning by Thang Luong, Dawsen Hwang, Hoan...
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
04 Nov 2025
Contributed by Lukas
In this episode, we discuss ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in ...
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
28 Oct 2025
Contributed by Lukas
In this episode, we discuss Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Lan...
ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases
27 Oct 2025
Contributed by Lukas
In this episode, we discuss ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases by ...
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
27 Oct 2025
Contributed by Lukas
In this episode, we discuss Scaling Instruction-Based Video Editing with a High-Quality Synthetic Da...
Reasoning with Sampling: Your Base Model is Smarter Than You Think
23 Oct 2025
Contributed by Lukas
In this episode, we discuss Reasoning with Sampling: Your Base Model is Smarter Than You Think by Aa...
DeepSeek-OCR: Contexts Optical Compression
21 Oct 2025
Contributed by Lukas
In this episode, we discuss DeepSeek-OCR: Contexts Optical Compression by The authors of the paper a...
The Markovian Thinker
16 Oct 2025
Contributed by Lukas
In this episode, we discuss The Markovian Thinker by Milad Aghajohari, Kamran Chitsaz, Amirhossein K...
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
08 Oct 2025
Contributed by Lukas
In this episode, we discuss DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-T...
Towards a Physics Foundation Model
03 Oct 2025
Contributed by Lukas
In this episode, we discuss Towards a Physics Foundation Model by Florian Wiesner, Matthias Wessling...
Scalable Option Learning in High-Throughput Environments
30 Sep 2025
Contributed by Lukas
In this episode, we discuss Scalable Option Learning in High-Throughput Environments by Mikael Henaf...
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
24 Sep 2025
Contributed by Lukas
In this episode, we discuss Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Rein...
Reverse-Engineered Reasoning for Open-Ended Generation
19 Sep 2025
Contributed by Lukas
In this episode, we discuss Reverse-Engineered Reasoning for Open-Ended Generation by Haozhe Wang, H...
Scaling Performance of Large Language Model Pretraining
16 Sep 2025
Contributed by Lukas
In this episode, we discuss Scaling Performance of Large Language Model Pretraining by Alexander Int...
General Social Agents
15 Sep 2025
Contributed by Lukas
In this episode, we discuss General Social Agents by Benjamin S. Manning, John J. Horton. The paper ...
We need a new ethics for a world of AI agents
12 Sep 2025
Contributed by Lukas
In this episode, we discuss We need a new ethics for a world of AI agents by Iason Gabriel, Geoff Ke...
Hierarchical Reasoning Model
11 Sep 2025
Contributed by Lukas
In this episode, we discuss Hierarchical Reasoning Model by Guan Wang, Jin Li, Yuhao Sun, Xing Chen,...
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
10 Sep 2025
Contributed by Lukas
In this episode, we discuss ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Short...
Small Language Models are the Future of Agentic AI
09 Sep 2025
Contributed by Lukas
In this episode, we discuss Small Language Models are the Future of Agentic AI by Peter Belcak, Greg...
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
08 Sep 2025
Contributed by Lukas
In this episode, we discuss Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM ...
Why Language Models Hallucinate
07 Sep 2025
Contributed by Lukas
In this episode, we discuss Why Language Models Hallucinate by The authors of the paper are: - Adam...
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
19 Aug 2025
Contributed by Lukas
In this episode, we discuss Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens...
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
15 Aug 2025
Contributed by Lukas
In this episode, we discuss Learning from Reward-Free Offline Data: A Case for Planning with Latent ...
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
13 Aug 2025
Contributed by Lukas
In this episode, we discuss Persona Vectors: Monitoring and Controlling Character Traits in Language...
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
01 Aug 2025
Contributed by Lukas
In this episode, we discuss Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoni...
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
31 Jul 2025
Contributed by Lukas
In this episode, we discuss Position: The AI Conference Peer Review Crisis Demands Author Feedback a...
Working with AI: Measuring the Occupational Implications of Generative AI
31 Jul 2025
Contributed by Lukas
In this episode, we discuss Working with AI: Measuring the Occupational Implications of Generative A...
Towards physician-centered oversight of conversational diagnostic AI
30 Jul 2025
Contributed by Lukas
In this episode, we discuss Towards physician-centered oversight of conversational diagnostic AI by ...
Learning without training: The implicit dynamics of in-context learning
28 Jul 2025
Contributed by Lukas
In this episode, we discuss Learning without training: The implicit dynamics of in-context learning ...
Aime: Towards Fully-Autonomous Multi-Agent Framework
25 Jul 2025
Contributed by Lukas
In this episode, we discuss Aime: Towards Fully-Autonomous Multi-Agent Framework by Yexuan Shi, Ming...
ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation
23 Jul 2025
Contributed by Lukas
In this episode, we discuss ARAG: Agentic Retrieval Augmented Generation for Personalized Recommenda...
4KAgent: Agentic Any Image to 4K Super-Resolution
18 Jul 2025
Contributed by Lukas
In this episode, we discuss 4KAgent: Agentic Any Image to 4K Super-Resolution by Yushen Zuo, Qi Zhen...
Critiques of World Models
16 Jul 2025
Contributed by Lukas
In this episode, we discuss Critiques of World Models by Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting...
Arxiv paper - Expert-level validation of AI-generated medical text with scalable language models
15 Jul 2025
Contributed by Lukas
In this episode, we discuss Expert-level validation of AI-generated medical text with scalable langu...
Arxiv paper - ImplicitQA: Going beyond frames towards Implicit Video Reasoning
11 Jul 2025
Contributed by Lukas
In this episode, we discuss ImplicitQA: Going beyond frames towards Implicit Video Reasoning by Sirn...
Arxiv paper - BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
08 Jul 2025
Contributed by Lukas
In this episode, we discuss BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing by ...
Arxiv paper - Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory
08 Jul 2025
Contributed by Lukas
In this episode, we discuss Strategic Intelligence in Large Language Models: Evidence from evolution...
Blogpost paper - Project Vend: Can Claude run a small shop? (And why does that matter?)
02 Jul 2025
Contributed by Lukas
In this episode, we discuss Project Vend: Can Claude run a small shop? (And why does that matter?) T...
Arxiv paper - Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
02 Jul 2025
Contributed by Lukas
In this episode, we discuss Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual ...
Arxiv paper - SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
30 Jun 2025
Contributed by Lukas
In this episode, we discuss SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based...
Arxiv paper - OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
27 Jun 2025
Contributed by Lukas
In this episode, we discuss OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, ...
Arxiv paper - Long-Context State-Space Video World Models
25 Jun 2025
Contributed by Lukas
In this episode, we discuss Long-Context State-Space Video World Models by Ryan Po, Yotam Nitzan, Ri...
Arxiv paper - From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
24 Jun 2025
Contributed by Lukas
In this episode, we discuss From Bytes to Ideas: Language Modeling with Autoregressive U-Nets by Mat...
Arxiv paper - Reinforcement Pre-Training
20 Jun 2025
Contributed by Lukas
In this episode, we discuss Reinforcement Pre-Training by Qingxiu Dong, Li Dong, Yao Tang, Tianzhu Y...
Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs
18 Jun 2025
Contributed by Lukas
In this episode, we discuss Token-Efficient Long Video Understanding for Multimodal LLMs by Jindong ...
Arxiv paper - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
11 Jun 2025
Contributed by Lukas
In this episode, we discuss The Illusion of Thinking: Understanding the Strengths and Limitations of...
Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
09 Jun 2025
Contributed by Lukas
In this episode, we discuss Vibe-Eval: A hard evaluation suite for measuring progress of multimodal ...
Arxiv paper - How much do language models memorize?
06 Jun 2025
Contributed by Lukas
In this episode, we discuss How much do language models memorize? by John X. Morris, Chawin Sitawari...
Arxiv paper - MMaDA: Multimodal Large Diffusion Language Models
03 Jun 2025
Contributed by Lukas
In this episode, we discuss MMaDA: Multimodal Large Diffusion Language Models by Ling Yang, Ye Tian,...
Arxiv paper - Superhuman performance of a large language model on the reasoning tasks of a physician
03 Jun 2025
Contributed by Lukas
In this episode, we discuss Superhuman performance of a large language model on the reasoning tasks ...
Arxiv paper - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
29 May 2025
Contributed by Lukas
In this episode, we discuss The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of ...
Arxiv paper - DanceGRPO: Unleashing GRPO on Visual Generation
28 May 2025
Contributed by Lukas
In this episode, we discuss DanceGRPO: Unleashing GRPO on Visual Generation by Zeyue Xue, Jie Wu, Yu...
Arxiv paper - Visual Planning: Let’s Think Only with Images
21 May 2025
Contributed by Lukas
In this episode, we discuss Visual Planning: Let's Think Only with Images by Yi Xu, Chengzu Li, Han ...
Arxiv paper - A Preliminary Study for GPT-4o on Image Restoration
14 May 2025
Contributed by Lukas
In this episode, we discuss A Preliminary Study for GPT-4o on Image Restoration by Hao Yang, Yan Yan...
Arxiv paper - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
12 May 2025
Contributed by Lukas
In this episode, we discuss DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoin...
Arxiv paper - RayZer: A Self-supervised Large View Synthesis Model
09 May 2025
Contributed by Lukas
In this episode, we discuss RayZer: A Self-supervised Large View Synthesis Model by Hanwen Jiang, Ha...
Arxiv paper - Reinforcement Learning for Reasoning in Large Language Models with One Training Example
08 May 2025
Contributed by Lukas
In this episode, we discuss Reinforcement Learning for Reasoning in Large Language Models with One T...
Arxiv paper - MINERVA: Evaluating Complex Video Reasoning
06 May 2025
Contributed by Lukas
In this episode, we discuss MINERVA: Evaluating Complex Video Reasoning by Arsha Nagrani, Sachit Men...
Arxiv paper - The Leaderboard Illusion
06 May 2025
Contributed by Lukas
In this episode, we discuss The Leaderboard Illusion by Shivalika Singh, Yiyang Nan, Alex Wang, Dani...
Arxiv paper - Towards Understanding Camera Motions in Any Video
05 May 2025
Contributed by Lukas
In this episode, we discuss Towards Understanding Camera Motions in Any Video by Zhiqiu Lin, Siyuan ...
Arxiv paper - Describe Anything: Detailed Localized Image and Video Captioning
29 Apr 2025
Contributed by Lukas
In this episode, we discuss Describe Anything: Detailed Localized Image and Video Captioning by Long...
Arxiv paper - MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION
28 Apr 2025
Contributed by Lukas
In this episode, we discuss MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION by ...
Arxiv paper - Self-Improving Robust Preference Optimization
23 Apr 2025
Contributed by Lukas
In this episode, we discuss Self-Improving Robust Preference Optimization by Eugene Choi, Arash Ahma...
Arxiv paper - LLM Post-Training: A Deep Dive into Reasoning Large Language Models
22 Apr 2025
Contributed by Lukas
In this episode, we discuss LLM Post-Training: A Deep Dive into Reasoning Large Language Models by K...
Arxiv paper - Welcome to the Era of Experience
21 Apr 2025
Contributed by Lukas
In this episode, we discuss Welcome to the Era of Experience by David Silver, Richard S. Sutton. The...
Arxiv paper - MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
19 Apr 2025
Contributed by Lukas
In this episode, we discuss MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Vide...
Arxiv paper - InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
17 Apr 2025
Contributed by Lukas
In this episode, we discuss InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-So...
Arxiv paper - EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise
16 Apr 2025
Contributed by Lukas
In this episode, we discuss EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent N...
Arxiv paper - TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
16 Apr 2025
Contributed by Lukas
In this episode, we discuss TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning by Xingjian...
Arxiv paper - Reasoning Models Don’t Always Say What They Think
09 Apr 2025
Contributed by Lukas
In this episode, we discuss Reasoning Models Don’t Always Say What They Think by The authors of th...
Arxiv paper - Slow-Fast Architecture for Video Multi-Modal Large Language Models
07 Apr 2025
Contributed by Lukas
In this episode, we discuss Slow-Fast Architecture for Video Multi-Modal Large Language Models by Mi...
Arxiv paper - TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
04 Apr 2025
Contributed by Lukas
In this episode, we discuss TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scene...
Arxiv paper - VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
01 Apr 2025
Contributed by Lukas
In this episode, we discuss VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning by Ye Liu, Kev...
Arxiv paper - SynCity: Training-Free Generation of 3D Worlds
28 Mar 2025
Contributed by Lukas
In this episode, we discuss SynCity: Training-Free Generation of 3D Worlds by Paul Engstler, Aleksan...
Arxiv paper - HD-EPIC: A Highly-Detailed Egocentric Video Dataset
26 Mar 2025
Contributed by Lukas
In this episode, we discuss HD-EPIC: A Highly-Detailed Egocentric Video Dataset by Toby Perrett, Ahm...
Arxiv paper - Video-T1: Test-Time Scaling for Video Generation
25 Mar 2025
Contributed by Lukas
In this episode, we discuss Video-T1: Test-Time Scaling for Video Generation by Fangfu Liu, Hanyang ...
Arxiv paper - Calibrated Multi-Preference Optimization for Aligning Diffusion Models
24 Mar 2025
Contributed by Lukas
In this episode, we discuss Calibrated Multi-Preference Optimization for Aligning Diffusion Models b...
Arxiv paper - Personalize Anything for Free with Diffusion Transformer
21 Mar 2025
Contributed by Lukas
In this episode, we discuss Personalize Anything for Free with Diffusion Transformer by Haoran Feng,...
Arxiv paper - Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
20 Mar 2025
Contributed by Lukas
In this episode, we discuss Story-Adapter: A Training-free Iterative Framework for Long Story Visual...
Arxiv paper - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
18 Mar 2025
Contributed by Lukas
In this episode, we discuss ReCamMaster: Camera-Controlled Generative Rendering from A Single Video ...
Arxiv paper - Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
17 Mar 2025
Contributed by Lukas
In this episode, we discuss Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Langua...
Arxiv paper - MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
13 Mar 2025
Contributed by Lukas
In this episode, we discuss MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks b...
Arxiv paper - TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
12 Mar 2025
Contributed by Lukas
In this episode, we discuss TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos vi...
Arxiv paper - PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
11 Mar 2025
Contributed by Lukas
In this episode, we discuss PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning T...
Arxiv paper - VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
08 Mar 2025
Contributed by Lukas
In this episode, we discuss VideoGrain: Modulating Space-Time Attention for Multi-grained Video Edit...
Arxiv paper - ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
04 Mar 2025
Contributed by Lukas
In this episode, we discuss ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimo...
Arxiv paper - Teaching Language Models to Critique via Reinforcement Learning
03 Mar 2025
Contributed by Lukas
In this episode, we discuss Teaching Language Models to Critique via Reinforcement Learning by Zhihu...
Arxiv paper - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
27 Feb 2025
Contributed by Lukas
In this episode, we discuss PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negat...
Arxiv paper - VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
24 Feb 2025
Contributed by Lukas
In this episode, we discuss VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Gener...
Arxiv paper - Heuristically Adaptive Diffusion-Model Evolutionary Strategy
22 Feb 2025
Contributed by Lukas
In this episode, we discuss Heuristically Adaptive Diffusion-Model Evolutionary Strategy by Benedikt...
Arxiv paper - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
20 Feb 2025
Contributed by Lukas
In this episode, we discuss Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Ap...
Arxiv paper - EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
19 Feb 2025
Contributed by Lukas
In this episode, we discuss EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Mod...
Arxiv paper - VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
14 Feb 2025
Contributed by Lukas
In this episode, we discuss VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained V...
Arxiv paper - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
13 Feb 2025
Contributed by Lukas
In this episode, we discuss VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Ge...