AI Breakdown
Episodes
ARC Is a Vision Problem!
09 Dec 2025
Contributed by Lukas
In this episode, we discuss ARC Is a Vision Problem! by Keya Hu, Ali Cy, Linlu Qiu, Xiaoman Delores Ding, Runqian Wang, Yeyin Eva Zhu, Jacob Andreas, ...
Solving a Million-Step LLM Task with Zero Errors
09 Dec 2025
Contributed by Lukas
In this episode, we discuss Solving a Million-Step LLM Task with Zero Errors by Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Oliv...
DataRater: Meta-Learned Dataset Curation
05 Dec 2025
Contributed by Lukas
In this episode, we discuss DataRater: Meta-Learned Dataset Curation by Dan A. Calian, Gregory Farquhar, Iurii Kemaev, Luisa M. Zintgraf, Matteo Hesse...
Mathematical exploration and discovery at scale
15 Nov 2025
Contributed by Lukas
In this episode, we discuss Mathematical exploration and discovery at scale by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner....
Kosmos: An AI Scientist for Autonomous Discovery
12 Nov 2025
Contributed by Lukas
In this episode, we discuss Kosmos: An AI Scientist for Autonomous Discovery by Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyle...
World Simulation with Video Foundation Models for Physical AI
08 Nov 2025
Contributed by Lukas
In this episode, we discuss World Simulation with Video Foundation Models for Physical AI by NVIDIA, :, Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Ba...
Towards Robust Mathematical Reasoning
06 Nov 2025
Contributed by Lukas
In this episode, we discuss Towards Robust Mathematical Reasoning by Thang Luong, Dawsen Hwang, Hoang H. Nguyen, Golnaz Ghiasi, Yuri Chervonyi, Insuk ...
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
04 Nov 2025
Contributed by Lukas
In this episode, we discuss ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models by Mingjie Liu, Shizhe Diao,...
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
28 Oct 2025
Contributed by Lukas
In this episode, we discuss Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models by Peter Robicheaux, Matvei Popov, An...
ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases
27 Oct 2025
Contributed by Lukas
In this episode, we discuss ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases by Ziqian Zhong, Aditi Raghunathan, Nicholas Carlini....
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
27 Oct 2025
Contributed by Lukas
In this episode, we discuss Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset by Qingyan Bai, Qiuyu Wang, Hao Ouyang, Yue ...
Reasoning with Sampling: Your Base Model is Smarter Than You Think
23 Oct 2025
Contributed by Lukas
In this episode, we discuss Reasoning with Sampling: Your Base Model is Smarter Than You Think by Aayush Karan, Yilun Du. The paper proposes a novel i...
DeepSeek-OCR: Contexts Optical Compression
21 Oct 2025
Contributed by Lukas
In this episode, we discuss DeepSeek-OCR: Contexts Optical Compression by The authors of the paper are: **Haoran Wei, Yaofeng Sun, Yukun Li**. DeepSe...
The Markovian Thinker
16 Oct 2025
Contributed by Lukas
In this episode, we discuss The Markovian Thinker by Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aar...
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
08 Oct 2025
Contributed by Lukas
In this episode, we discuss DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL by Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen ...
Towards a Physics Foundation Model
03 Oct 2025
Contributed by Lukas
In this episode, we discuss Towards a Physics Foundation Model by Florian Wiesner, Matthias Wessling, Stephen Baek. This paper introduces the General ...
Scalable Option Learning in High-Throughput Environments
30 Sep 2025
Contributed by Lukas
In this episode, we discuss Scalable Option Learning in High-Throughput Environments by Mikael Henaff, Scott Fujimoto, Michael Rabbat. The paper prese...
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
24 Sep 2025
Contributed by Lukas
In this episode, we discuss Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning by Shenzhi Wa...
Reverse-Engineered Reasoning for Open-Ended Generation
19 Sep 2025
Contributed by Lukas
In this episode, we discuss Reverse-Engineered Reasoning for Open-Ended Generation by Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou...
Scaling Performance of Large Language Model Pretraining
16 Sep 2025
Contributed by Lukas
In this episode, we discuss Scaling Performance of Large Language Model Pretraining by Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, ...
General Social Agents
15 Sep 2025
Contributed by Lukas
In this episode, we discuss General Social Agents by Benjamin S. Manning, John J. Horton. The paper proposes using AI agents guided by social science ...
We need a new ethics for a world of AI agents
12 Sep 2025
Contributed by Lukas
In this episode, we discuss We need a new ethics for a world of AI agents by Iason Gabriel, Geoff Keeling, Arianna Manzini & James Evans. The pape...
Hierarchical Reasoning Model
11 Sep 2025
Contributed by Lukas
In this episode, we discuss Hierarchical Reasoning Model by Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin A...
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
10 Sep 2025
Contributed by Lukas
In this episode, we discuss ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts by Yuying Ge, Yixiao Ge, Chen Li, Teng Wang, Jun...
Small Language Models are the Future of Agentic AI
09 Sep 2025
Contributed by Lukas
In this episode, we discuss Small Language Models are the Future of Agentic AI by Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saur...
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
08 Sep 2025
Contributed by Lukas
In this episode, we discuss Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents by Davide Paglieri, Bartłomiej Cupiał, Jo...
Why Language Models Hallucinate
07 Sep 2025
Contributed by Lukas
In this episode, we discuss Why Language Models Hallucinate by The authors of the paper are: - Adam Tauman Kalai - Ofir Nachum - Santosh S. Vempa...
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
19 Aug 2025
Contributed by Lukas
In this episode, we discuss Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens by Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei...
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
15 Aug 2025
Contributed by Lukas
In this episode, we discuss Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models by Vlad Sobal, Wancong Zhang, Kyun...
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
13 Aug 2025
Contributed by Lukas
In this episode, we discuss Persona Vectors: Monitoring and Controlling Character Traits in Language Models by Runjin Chen, Andy Arditi, Henry Sleight...
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
01 Aug 2025
Contributed by Lukas
In this episode, we discuss Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning by Jaedong Hwang, Kumar Tanmay, Seok-Jin Lee, A...
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
31 Jul 2025
Contributed by Lukas
In this episode, we discuss Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards by Jaeho Kim, Yunseok Lee, Seu...
Working with AI: Measuring the Occupational Implications of Generative AI
31 Jul 2025
Contributed by Lukas
In this episode, we discuss Working with AI: Measuring the Occupational Implications of Generative AI by Kiran Tomlinson, Sonia Jaffe, Will Wang, Scot...
Towards physician-centered oversight of conversational diagnostic AI
30 Jul 2025
Contributed by Lukas
In this episode, we discuss Towards physician-centered oversight of conversational diagnostic AI by Elahe Vedadi, David Barrett, Natalie Harris, Eller...
Learning without training: The implicit dynamics of in-context learning
28 Jul 2025
Contributed by Lukas
In this episode, we discuss Learning without training: The implicit dynamics of in-context learning by Benoit Dherin, Michael Munn, Hanna Mazzawi, Mic...
Aime: Towards Fully-Autonomous Multi-Agent Framework
25 Jul 2025
Contributed by Lukas
In this episode, we discuss Aime: Towards Fully-Autonomous Multi-Agent Framework by Yexuan Shi, Mingyu Wang, Yunxiang Cao, Hongjie Lai, Junjian Lan, X...
ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation
23 Jul 2025
Contributed by Lukas
In this episode, we discuss ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation by Reza Yousefi Maragheh, Pratheek Vadla, Pri...
4KAgent: Agentic Any Image to 4K Super-Resolution
18 Jul 2025
Contributed by Lukas
In this episode, we discuss 4KAgent: Agentic Any Image to 4K Super-Resolution by Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang...
Critiques of World Models
16 Jul 2025
Contributed by Lukas
In this episode, we discuss Critiques of World Models by Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu. The paper critiques existing approaches to wo...
Arxiv paper - Expert-level validation of AI-generated medical text with scalable language models
15 Jul 2025
Contributed by Lukas
In this episode, we discuss Expert-level validation of AI-generated medical text with scalable language models by Asad Aali, Vasiliki Bikia, Maya Varm...
Arxiv paper - ImplicitQA: Going beyond frames towards Implicit Video Reasoning
11 Jul 2025
Contributed by Lukas
In this episode, we discuss ImplicitQA: Going beyond frames towards Implicit Video Reasoning by Sirnam Swetha, Rohit Gupta, Parth Parag Kulkarni, Davi...
Arxiv paper - BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
08 Jul 2025
Contributed by Lukas
In this episode, we discuss BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing by Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xi...
Arxiv paper - Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory
08 Jul 2025
Contributed by Lukas
In this episode, we discuss Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory by Kenneth Payne, Baptiste Alloui-...
Blogpost paper - Project Vend: Can Claude run a small shop? (And why does that matter?)
02 Jul 2025
Contributed by Lukas
In this episode, we discuss Project Vend: Can Claude run a small shop? (And why does that matter?) The paper describes a month-long experiment where t...
Arxiv paper - Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
02 Jul 2025
Contributed by Lukas
In this episode, we discuss Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens by Zeyuan Yang, Xueyang Yu, Delin Chen, Mao...
Arxiv paper - SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
30 Jun 2025
Contributed by Lukas
In this episode, we discuss SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing by Ming Li, Xin Gu, Fan Chen, Xiaoy...
Arxiv paper - OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
27 Jun 2025
Contributed by Lukas
In this episode, we discuss OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization b...
Arxiv paper - Long-Context State-Space Video World Models
25 Jun 2025
Contributed by Lukas
In this episode, we discuss Long-Context State-Space Video World Models by Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, ...
Arxiv paper - From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
24 Jun 2025
Contributed by Lukas
In this episode, we discuss From Bytes to Ideas: Language Modeling with Autoregressive U-Nets by Mathurin Videau, Badr Youbi Idrissi, Alessandro Leite...
Arxiv paper - Reinforcement Pre-Training
20 Jun 2025
Contributed by Lukas
In this episode, we discuss Reinforcement Pre-Training by Qingxiu Dong, Li Dong, Yao Tang, Tianzhu Ye, Yutao Sun, Zhifang Sui, Furu Wei. The paper int...
Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs
18 Jun 2025
Contributed by Lukas
In this episode, we discuss Token-Efficient Long Video Understanding for Multimodal LLMs by Jindong Jiang, Xiuyu Li, Zhijian Liu, Muyang Li, Guo Chen,...
Arxiv paper - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
11 Jun 2025
Contributed by Lukas
In this episode, we discuss The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexi...
Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
09 Jun 2025
Contributed by Lukas
In this episode, we discuss Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models by Piotr Padlewski, Max Bain, Matt...
Arxiv paper - How much do language models memorize?
06 Jun 2025
Contributed by Lukas
In this episode, we discuss How much do language models memorize? by John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Al...
Arxiv paper - MMaDA: Multimodal Large Diffusion Language Models
03 Jun 2025
Contributed by Lukas
In this episode, we discuss MMaDA: Multimodal Large Diffusion Language Models by Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Me...
Arxiv paper - Superhuman performance of a large language model on the reasoning tasks of a physician
03 Jun 2025
Contributed by Lukas
In this episode, we discuss Superhuman performance of a large language model on the reasoning tasks of a physician by Peter G. Brodeur, Thomas A. Buck...
Arxiv paper - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
29 May 2025
Contributed by Lukas
In this episode, we discuss The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models by Seungone K...
Arxiv paper - DanceGRPO: Unleashing GRPO on Visual Generation
28 May 2025
Contributed by Lukas
In this episode, we discuss DanceGRPO: Unleashing GRPO on Visual Generation by Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, ...
Arxiv paper - Visual Planning: Let’s Think Only with Images
21 May 2025
Contributed by Lukas
In this episode, we discuss Visual Planning: Let's Think Only with Images by Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang, Anna Korhonen, Iv...
Arxiv paper - A Preliminary Study for GPT-4o on Image Restoration
14 May 2025
Contributed by Lukas
In this episode, we discuss A Preliminary Study for GPT-4o on Image Restoration by Hao Yang, Yan Yang, Ruikun Zhang, Liyuan Pan. This paper presents t...
Arxiv paper - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
12 May 2025
Contributed by Lukas
In this episode, we discuss DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion by Qitao Zhao, Amy Lin, Jeff Tan, Jaso...
Arxiv paper - RayZer: A Self-supervised Large View Synthesis Model
09 May 2025
Contributed by Lukas
In this episode, we discuss RayZer: A Self-supervised Large View Synthesis Model by Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai...
Arxiv paper - Reinforcement Learning for Reasoning in Large Language Models with One Training Example
08 May 2025
Contributed by Lukas
In this episode, we discuss Reinforcement Learning for Reasoning in Large Language Models with One Training Example by Yiping Wang, Qing Yang, Zhiyuan...
Arxiv paper - MINERVA: Evaluating Complex Video Reasoning
06 May 2025
Contributed by Lukas
In this episode, we discuss MINERVA: Evaluating Complex Video Reasoning by Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa...
Arxiv paper - The Leaderboard Illusion
06 May 2025
Contributed by Lukas
In this episode, we discuss The Leaderboard Illusion by Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D'Souza, Sayash Kapoor, Ahmet Üstün, Sanmi Ko...
Arxiv paper - Towards Understanding Camera Motions in Any Video
05 May 2025
Contributed by Lukas
In this episode, we discuss Towards Understanding Camera Motions in Any Video by Zhiqiu Lin, Siyuan Cen, Daniel Jiang, Jay Karhade, Hewei Wang, Chanch...
Arxiv paper - Describe Anything: Detailed Localized Image and Video Captioning
29 Apr 2025
Contributed by Lukas
In this episode, we discuss Describe Anything: Detailed Localized Image and Video Captioning by Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao...
Arxiv paper - MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION
28 Apr 2025
Contributed by Lukas
In this episode, we discuss MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION by The authors of the paper are: - Chayne Thrash - Al...
Arxiv paper - Self-Improving Robust Preference Optimization
23 Apr 2025
Contributed by Lukas
In this episode, we discuss Self-Improving Robust Preference Optimization by Eugene Choi, Arash Ahmadian, Matthieu Geist, Oilvier Pietquin, Mohammad G...
Arxiv paper - LLM Post-Training: A Deep Dive into Reasoning Large Language Models
22 Apr 2025
Contributed by Lukas
In this episode, we discuss LLM Post-Training: A Deep Dive into Reasoning Large Language Models by Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Mu...
Arxiv paper - Welcome to the Era of Experience
21 Apr 2025
Contributed by Lukas
In this episode, we discuss Welcome to the Era of Experience by David Silver, Richard S. Sutton. The paper discusses the forthcoming era of artificial...
Arxiv paper - MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
19 Apr 2025
Contributed by Lukas
In this episode, we discuss MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation by Sihyun Yu, Meera Hahn, Dan Kondrat...
Arxiv paper - InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
17 Apr 2025
Contributed by Lukas
In this episode, we discuss InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models by The authors of the paper...
Arxiv paper - EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise
16 Apr 2025
Contributed by Lukas
In this episode, we discuss EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise by The authors of the paper are: - **Chao Li...
Arxiv paper - TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
16 Apr 2025
Contributed by Lukas
In this episode, we discuss TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning by Xingjian Zhang, Siwei Wen, Wenjun Wu, Lei Huang. The paper...
Arxiv paper - Reasoning Models Don’t Always Say What They Think
09 Apr 2025
Contributed by Lukas
In this episode, we discuss Reasoning Models Don’t Always Say What They Think by The authors of the paper "Reasoning Models Don’t Always Say What ...
Arxiv paper - Slow-Fast Architecture for Video Multi-Modal Large Language Models
07 Apr 2025
Contributed by Lukas
In this episode, we discuss Slow-Fast Architecture for Video Multi-Modal Large Language Models by Min Shi, Shihao Wang, Chieh-Yun Chen, Jitesh Jain, K...
Arxiv paper - TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
04 Apr 2025
Contributed by Lukas
In this episode, we discuss TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes by Nikai Du, Zhennan Chen, Zhizhou Chen, Shan Ga...
Arxiv paper - VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
01 Apr 2025
Contributed by Lukas
In this episode, we discuss VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning by Ye Liu, Kevin Qinghong Lin, Chang Wen Chen, Mike Zheng Shou. ...
Arxiv paper - SynCity: Training-Free Generation of 3D Worlds
28 Mar 2025
Contributed by Lukas
In this episode, we discuss SynCity: Training-Free Generation of 3D Worlds by Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, A...
Arxiv paper - HD-EPIC: A Highly-Detailed Egocentric Video Dataset
26 Mar 2025
Contributed by Lukas
In this episode, we discuss HD-EPIC: A Highly-Detailed Egocentric Video Dataset by Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pol...
Arxiv paper - Video-T1: Test-Time Scaling for Video Generation
25 Mar 2025
Contributed by Lukas
In this episode, we discuss Video-T1: Test-Time Scaling for Video Generation by Fangfu Liu, Hanyang Wang, Yimo Cai, Kaiyan Zhang, Xiaohang Zhan, Yueqi...
Arxiv paper - Calibrated Multi-Preference Optimization for Aligning Diffusion Models
24 Mar 2025
Contributed by Lukas
In this episode, we discuss Calibrated Multi-Preference Optimization for Aligning Diffusion Models by Kyungmin Lee, Xiaohang Li, Qifei Wang, Junfeng H...
Arxiv paper - Personalize Anything for Free with Diffusion Transformer
21 Mar 2025
Contributed by Lukas
In this episode, we discuss Personalize Anything for Free with Diffusion Transformer by Haoran Feng, Zehuan Huang, Lin Li, Hairong Lv, Lu Sheng. The p...
Arxiv paper - Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
20 Mar 2025
Contributed by Lukas
In this episode, we discuss Story-Adapter: A Training-free Iterative Framework for Long Story Visualization by Jiawei Mao, Xiaoke Huang, Yunfei Xie, Y...
Arxiv paper - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
18 Mar 2025
Contributed by Lukas
In this episode, we discuss ReCamMaster: Camera-Controlled Generative Rendering from A Single Video by Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang...
Arxiv paper - Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
17 Mar 2025
Contributed by Lukas
In this episode, we discuss Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models by Wenxuan Huang, Bohan Jia, Zijie Zhai,...
Arxiv paper - MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
13 Mar 2025
Contributed by Lukas
In this episode, we discuss MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks by Jiacheng Chen, Tianhao Liang, Sherman Siu, Zheng...
Arxiv paper - TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
12 Mar 2025
Contributed by Lukas
In this episode, we discuss TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models by Mark YU, Wenbo Hu, Jinbo Xin...
Arxiv paper - PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
11 Mar 2025
Contributed by Lukas
In this episode, we discuss PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving by Mihir P...
Arxiv paper - VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
08 Mar 2025
Contributed by Lukas
In this episode, we discuss VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing by Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Y...
Arxiv paper - ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
04 Mar 2025
Contributed by Lukas
In this episode, we discuss ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models by Jonathan Roberts, Mohammad Reza Taes...
Arxiv paper - Teaching Language Models to Critique via Reinforcement Learning
03 Mar 2025
Contributed by Lukas
In this episode, we discuss Teaching Language Models to Critique via Reinforcement Learning by Zhihui Xie, Jie chen, Liyu Chen, Weichao Mao, Jingjing ...
Arxiv paper - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
27 Feb 2025
Contributed by Lukas
In this episode, we discuss PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling by Avery ...
Arxiv paper - VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
24 Feb 2025
Contributed by Lukas
In this episode, we discuss VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation by Sixiao Zheng, Zimian Peng, Yanpeng Zhou, ...
Arxiv paper - Heuristically Adaptive Diffusion-Model Evolutionary Strategy
22 Feb 2025
Contributed by Lukas
In this episode, we discuss Heuristically Adaptive Diffusion-Model Evolutionary Strategy by Benedikt Hartl, Yanbo Zhang, Hananel Hazan, Michael Levin....
Arxiv paper - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
20 Feb 2025
Contributed by Lukas
In this episode, we discuss Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach by Jonas Geiping, Sean McLeish, Neel Jain, ...
Arxiv paper - EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
19 Feb 2025
Contributed by Lukas
In this episode, we discuss EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents by Rui Yang,...
Arxiv paper - VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
14 Feb 2025
Contributed by Lukas
In this episode, we discuss VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection by Songhao...
Arxiv paper - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
13 Feb 2025
Contributed by Lukas
In this episode, we discuss VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models by Hila Chefer, Uriel Sin...