AI Breakdown

Beyond Language Modeling: An Exploration of Multimodal Pretraining

06 Mar 2026

Contributed by Lukas

In this episode, we discuss Beyond Language Modeling: An Exploration of Multimodal Pretraining by Shengbang Tong, David Fan, John Nguyen, Ellis Brown,...

Mode Seeking meets Mean Seeking for Fast Long Video Generation

04 Mar 2026

Contributed by Lukas

In this episode, we discuss Mode Seeking meets Mean Seeking for Fast Long Video Generation by Shengqu Cai, Weili Nie, Chao Liu, Julius Berner, Lvmin Z...

Recursive Language Models

04 Mar 2026

Contributed by Lukas

In this episode, we discuss Recursive Language Models by Alex L. Zhang, Tim Kraska, Omar Khattab. The paper introduces Recursive Language Models (RLMs...

PaperBanana: Automating Academic Illustration for AI Scientists

10 Feb 2026

Contributed by Lukas

In this episode, we discuss PaperBanana: Automating Academic Illustration for AI Scientists by Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, To...

World-Gymnast: Training Robots with Reinforcement Learning in a World Model

10 Feb 2026

Contributed by Lukas

In this episode, we discuss World-Gymnast: Training Robots with Reinforcement Learning in a World Model by Ansh Kumar Sharma, Yixiang Sun, Ninghao Lu,...

Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

29 Jan 2026

Contributed by Lukas

In this episode, we discuss Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory by Dohun Lee, Chun-Hao Paul Huang, Xuelin Chen, Jong Ch...

Self-Rewarding Language Models

08 Jan 2026

Contributed by Lukas

In this episode, we discuss Self-Rewarding Language Models by Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu...

On the generalization of language models from in-context learning and finetuning: a controlled study

05 Jan 2026

Contributed by Lukas

In this episode, we discuss On the generalization of language models from in-context learning and finetuning: a controlled study by Andrew K. Lampinen...

OpenThoughts: Data Recipes for Reasoning Models

16 Dec 2025

Contributed by Lukas

In this episode, we discuss OpenThoughts: Data Recipes for Reasoning Models by Etash Guha, Ryan Marten, Sedrick Keh, Negin Raoof, Georgios Smyrnis, Hr...

Nested Learning: The Illusion of Deep Learning Architecture

13 Dec 2025

Contributed by Lukas

In this episode, we discuss Nested Learning: The Illusion of Deep Learning Architecture by The authors of the paper "Nested Learning: The Illusion of ...

ARC Is a Vision Problem!

09 Dec 2025

Contributed by Lukas

In this episode, we discuss ARC Is a Vision Problem! by Keya Hu, Ali Cy, Linlu Qiu, Xiaoman Delores Ding, Runqian Wang, Yeyin Eva Zhu, Jacob Andreas, ...

Solving a Million-Step LLM Task with Zero Errors

09 Dec 2025

Contributed by Lukas

In this episode, we discuss Solving a Million-Step LLM Task with Zero Errors by Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Oliv...

DataRater: Meta-Learned Dataset Curation

05 Dec 2025

Contributed by Lukas

In this episode, we discuss DataRater: Meta-Learned Dataset Curation by Dan A. Calian, Gregory Farquhar, Iurii Kemaev, Luisa M. Zintgraf, Matteo Hesse...

Mathematical exploration and discovery at scale

15 Nov 2025

Contributed by Lukas

In this episode, we discuss Mathematical exploration and discovery at scale by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner....

Kosmos: An AI Scientist for Autonomous Discovery

12 Nov 2025

Contributed by Lukas

In this episode, we discuss Kosmos: An AI Scientist for Autonomous Discovery by Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyle...

World Simulation with Video Foundation Models for Physical AI

08 Nov 2025

Contributed by Lukas

In this episode, we discuss World Simulation with Video Foundation Models for Physical AI by NVIDIA, :, Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Ba...

Towards Robust Mathematical Reasoning

06 Nov 2025

Contributed by Lukas

In this episode, we discuss Towards Robust Mathematical Reasoning by Thang Luong, Dawsen Hwang, Hoang H. Nguyen, Golnaz Ghiasi, Yuri Chervonyi, Insuk ...

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

04 Nov 2025

Contributed by Lukas

In this episode, we discuss ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models by Mingjie Liu, Shizhe Diao,...

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models

28 Oct 2025

Contributed by Lukas

In this episode, we discuss Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models by Peter Robicheaux, Matvei Popov, An...

ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases

27 Oct 2025

Contributed by Lukas

In this episode, we discuss ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases by Ziqian Zhong, Aditi Raghunathan, Nicholas Carlini....

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

27 Oct 2025

Contributed by Lukas

In this episode, we discuss Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset by Qingyan Bai, Qiuyu Wang, Hao Ouyang, Yue ...

Reasoning with Sampling: Your Base Model is Smarter Than You Think

23 Oct 2025

Contributed by Lukas

In this episode, we discuss Reasoning with Sampling: Your Base Model is Smarter Than You Think by Aayush Karan, Yilun Du. The paper proposes a novel i...

DeepSeek-OCR: Contexts Optical Compression

21 Oct 2025

Contributed by Lukas

In this episode, we discuss DeepSeek-OCR: Contexts Optical Compression by The authors of the paper are: **Haoran Wei, Yaofeng Sun, Yukun Li**. DeepSe...

The Markovian Thinker

16 Oct 2025

Contributed by Lukas

In this episode, we discuss The Markovian Thinker by Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aar...

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

08 Oct 2025

Contributed by Lukas

In this episode, we discuss DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL by Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen ...

Towards a Physics Foundation Model

03 Oct 2025

Contributed by Lukas

In this episode, we discuss Towards a Physics Foundation Model by Florian Wiesner, Matthias Wessling, Stephen Baek. This paper introduces the General ...

Scalable Option Learning in High-Throughput Environments

30 Sep 2025

Contributed by Lukas

In this episode, we discuss Scalable Option Learning in High-Throughput Environments by Mikael Henaff, Scott Fujimoto, Michael Rabbat. The paper prese...

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

24 Sep 2025

Contributed by Lukas

In this episode, we discuss Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning by Shenzhi Wa...

Reverse-Engineered Reasoning for Open-Ended Generation

19 Sep 2025

Contributed by Lukas

In this episode, we discuss Reverse-Engineered Reasoning for Open-Ended Generation by Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou...

Scaling Performance of Large Language Model Pretraining

16 Sep 2025

Contributed by Lukas

In this episode, we discuss Scaling Performance of Large Language Model Pretraining by Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, ...

General Social Agents

15 Sep 2025

Contributed by Lukas

In this episode, we discuss General Social Agents by Benjamin S. Manning, John J. Horton. The paper proposes using AI agents guided by social science ...

We need a new ethics for a world of AI agents

12 Sep 2025

Contributed by Lukas

In this episode, we discuss We need a new ethics for a world of AI agents by Iason Gabriel, Geoff Keeling, Arianna Manzini & James Evans. The pape...

Hierarchical Reasoning Model

11 Sep 2025

Contributed by Lukas

In this episode, we discuss Hierarchical Reasoning Model by Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin A...

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

10 Sep 2025

Contributed by Lukas

In this episode, we discuss ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts by Yuying Ge, Yixiao Ge, Chen Li, Teng Wang, Jun...

Small Language Models are the Future of Agentic AI

09 Sep 2025

Contributed by Lukas

In this episode, we discuss Small Language Models are the Future of Agentic AI by Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saur...

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

08 Sep 2025

Contributed by Lukas

In this episode, we discuss Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents by Davide Paglieri, Bartłomiej Cupiał, Jo...

Why Language Models Hallucinate

07 Sep 2025

Contributed by Lukas

In this episode, we discuss Why Language Models Hallucinate by The authors of the paper are: - Adam Tauman Kalai - Ofir Nachum - Santosh S. Vempa...

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

19 Aug 2025

Contributed by Lukas

In this episode, we discuss Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens by Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei...

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

15 Aug 2025

Contributed by Lukas

In this episode, we discuss Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models by Vlad Sobal, Wancong Zhang, Kyun...

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

13 Aug 2025

Contributed by Lukas

In this episode, we discuss Persona Vectors: Monitoring and Controlling Character Traits in Language Models by Runjin Chen, Andy Arditi, Henry Sleight...

Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

01 Aug 2025

Contributed by Lukas

In this episode, we discuss Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning by Jaedong Hwang, Kumar Tanmay, Seok-Jin Lee, A...

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards

31 Jul 2025

Contributed by Lukas

In this episode, we discuss Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards by Jaeho Kim, Yunseok Lee, Seu...

Working with AI: Measuring the Occupational Implications of Generative AI

31 Jul 2025

Contributed by Lukas

In this episode, we discuss Working with AI: Measuring the Occupational Implications of Generative AI by Kiran Tomlinson, Sonia Jaffe, Will Wang, Scot...

Towards physician-centered oversight of conversational diagnostic AI

30 Jul 2025

Contributed by Lukas

In this episode, we discuss Towards physician-centered oversight of conversational diagnostic AI by Elahe Vedadi, David Barrett, Natalie Harris, Eller...

Learning without training: The implicit dynamics of in-context learning

28 Jul 2025

Contributed by Lukas

In this episode, we discuss Learning without training: The implicit dynamics of in-context learning by Benoit Dherin, Michael Munn, Hanna Mazzawi, Mic...

Aime: Towards Fully-Autonomous Multi-Agent Framework

25 Jul 2025

Contributed by Lukas

In this episode, we discuss Aime: Towards Fully-Autonomous Multi-Agent Framework by Yexuan Shi, Mingyu Wang, Yunxiang Cao, Hongjie Lai, Junjian Lan, X...

ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation

23 Jul 2025

Contributed by Lukas

In this episode, we discuss ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation by Reza Yousefi Maragheh, Pratheek Vadla, Pri...

4KAgent: Agentic Any Image to 4K Super-Resolution

18 Jul 2025

Contributed by Lukas

In this episode, we discuss 4KAgent: Agentic Any Image to 4K Super-Resolution by Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang...

Critiques of World Models

16 Jul 2025

Contributed by Lukas

In this episode, we discuss Critiques of World Models by Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu. The paper critiques existing approaches to wo...

Arxiv paper - Expert-level validation of AI-generated medical text with scalable language models

15 Jul 2025

Contributed by Lukas

In this episode, we discuss Expert-level validation of AI-generated medical text with scalable language models by Asad Aali, Vasiliki Bikia, Maya Varm...

Arxiv paper - ImplicitQA: Going beyond frames towards Implicit Video Reasoning

11 Jul 2025

Contributed by Lukas

In this episode, we discuss ImplicitQA: Going beyond frames towards Implicit Video Reasoning by Sirnam Swetha, Rohit Gupta, Parth Parag Kulkarni, Davi...

Arxiv paper - BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

08 Jul 2025

Contributed by Lukas

In this episode, we discuss BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing by Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xi...

Arxiv paper - Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory

08 Jul 2025

Contributed by Lukas

In this episode, we discuss Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory by Kenneth Payne, Baptiste Alloui-...

Blogpost paper - Project Vend: Can Claude run a small shop? (And why does that matter?)

02 Jul 2025

Contributed by Lukas

In this episode, we discuss Project Vend: Can Claude run a small shop? (And why does that matter?) The paper describes a month-long experiment where t...

Arxiv paper - Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

02 Jul 2025

Contributed by Lukas

In this episode, we discuss Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens by Zeyuan Yang, Xueyang Yu, Delin Chen, Mao...

Arxiv paper - SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

30 Jun 2025

Contributed by Lukas

In this episode, we discuss SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing by Ming Li, Xin Gu, Fan Chen, Xiaoy...

Arxiv paper - OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

27 Jun 2025

Contributed by Lukas

In this episode, we discuss OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization b...

Arxiv paper - Long-Context State-Space Video World Models

25 Jun 2025

Contributed by Lukas

In this episode, we discuss Long-Context State-Space Video World Models by Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, ...

Arxiv paper - From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

24 Jun 2025

Contributed by Lukas

In this episode, we discuss From Bytes to Ideas: Language Modeling with Autoregressive U-Nets by Mathurin Videau, Badr Youbi Idrissi, Alessandro Leite...

Arxiv paper - Reinforcement Pre-Training

20 Jun 2025

Contributed by Lukas

In this episode, we discuss Reinforcement Pre-Training by Qingxiu Dong, Li Dong, Yao Tang, Tianzhu Ye, Yutao Sun, Zhifang Sui, Furu Wei. The paper int...

Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs

18 Jun 2025

Contributed by Lukas

In this episode, we discuss Token-Efficient Long Video Understanding for Multimodal LLMs by Jindong Jiang, Xiuyu Li, Zhijian Liu, Muyang Li, Guo Chen,...

Arxiv paper - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

11 Jun 2025

Contributed by Lukas

In this episode, we discuss The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexi...

Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

09 Jun 2025

Contributed by Lukas

In this episode, we discuss Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models by Piotr Padlewski, Max Bain, Matt...

Arxiv paper - How much do language models memorize?

06 Jun 2025

Contributed by Lukas

In this episode, we discuss How much do language models memorize? by John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Al...

Arxiv paper - MMaDA: Multimodal Large Diffusion Language Models

03 Jun 2025

Contributed by Lukas

In this episode, we discuss MMaDA: Multimodal Large Diffusion Language Models by Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Me...

Arxiv paper - Superhuman performance of a large language model on the reasoning tasks of a physician

03 Jun 2025

Contributed by Lukas

In this episode, we discuss Superhuman performance of a large language model on the reasoning tasks of a physician by Peter G. Brodeur, Thomas A. Buck...

Arxiv paper - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

29 May 2025

Contributed by Lukas

In this episode, we discuss The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models by Seungone K...

Arxiv paper - DanceGRPO: Unleashing GRPO on Visual Generation

28 May 2025

Contributed by Lukas

In this episode, we discuss DanceGRPO: Unleashing GRPO on Visual Generation by Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, ...

Arxiv paper - Visual Planning: Let’s Think Only with Images

21 May 2025

Contributed by Lukas

In this episode, we discuss Visual Planning: Let's Think Only with Images by Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang, Anna Korhonen, Iv...

Arxiv paper - A Preliminary Study for GPT-4o on Image Restoration

14 May 2025

Contributed by Lukas

In this episode, we discuss A Preliminary Study for GPT-4o on Image Restoration by Hao Yang, Yan Yang, Ruikun Zhang, Liyuan Pan. This paper presents t...

Arxiv paper - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion

12 May 2025

Contributed by Lukas

In this episode, we discuss DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion by Qitao Zhao, Amy Lin, Jeff Tan, Jaso...

Arxiv paper - RayZer: A Self-supervised Large View Synthesis Model

09 May 2025

Contributed by Lukas

In this episode, we discuss RayZer: A Self-supervised Large View Synthesis Model by Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai...

Arxiv paper - Reinforcement Learning for Reasoning in Large Language Models with One Training Example

08 May 2025

Contributed by Lukas

In this episode, we discuss Reinforcement Learning for Reasoning in Large Language Models with One Training Example by Yiping Wang, Qing Yang, Zhiyuan...

Arxiv paper - MINERVA: Evaluating Complex Video Reasoning

06 May 2025

Contributed by Lukas

In this episode, we discuss MINERVA: Evaluating Complex Video Reasoning by Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa...

Arxiv paper - The Leaderboard Illusion

06 May 2025

Contributed by Lukas

In this episode, we discuss The Leaderboard Illusion by Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D'Souza, Sayash Kapoor, Ahmet Üstün, Sanmi Ko...

Arxiv paper - Towards Understanding Camera Motions in Any Video

05 May 2025

Contributed by Lukas

In this episode, we discuss Towards Understanding Camera Motions in Any Video by Zhiqiu Lin, Siyuan Cen, Daniel Jiang, Jay Karhade, Hewei Wang, Chanch...

Arxiv paper - Describe Anything: Detailed Localized Image and Video Captioning

29 Apr 2025

Contributed by Lukas

In this episode, we discuss Describe Anything: Detailed Localized Image and Video Captioning by Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao...

Arxiv paper - MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION

28 Apr 2025

Contributed by Lukas

In this episode, we discuss MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION by The authors of the paper are: - Chayne Thrash - Al...

Arxiv paper - Self-Improving Robust Preference Optimization

23 Apr 2025

Contributed by Lukas

In this episode, we discuss Self-Improving Robust Preference Optimization by Eugene Choi, Arash Ahmadian, Matthieu Geist, Oilvier Pietquin, Mohammad G...

Arxiv paper - LLM Post-Training: A Deep Dive into Reasoning Large Language Models

22 Apr 2025

Contributed by Lukas

In this episode, we discuss LLM Post-Training: A Deep Dive into Reasoning Large Language Models by Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Mu...

Arxiv paper - Welcome to the Era of Experience

21 Apr 2025

Contributed by Lukas

In this episode, we discuss Welcome to the Era of Experience by David Silver, Richard S. Sutton. The paper discusses the forthcoming era of artificial...

Arxiv paper - MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

19 Apr 2025

Contributed by Lukas

In this episode, we discuss MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation by Sihyun Yu, Meera Hahn, Dan Kondrat...

Arxiv paper - InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

17 Apr 2025

Contributed by Lukas

In this episode, we discuss InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models by The authors of the paper...

Arxiv paper - EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise

16 Apr 2025

Contributed by Lukas

In this episode, we discuss EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise by The authors of the paper are: - **Chao Li...

Arxiv paper - TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

16 Apr 2025

Contributed by Lukas

In this episode, we discuss TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning by Xingjian Zhang, Siwei Wen, Wenjun Wu, Lei Huang. The paper...

Arxiv paper - Reasoning Models Don’t Always Say What They Think

09 Apr 2025

Contributed by Lukas

In this episode, we discuss Reasoning Models Don’t Always Say What They Think by The authors of the paper "Reasoning Models Don’t Always Say What ...

Arxiv paper - Slow-Fast Architecture for Video Multi-Modal Large Language Models

07 Apr 2025

Contributed by Lukas

In this episode, we discuss Slow-Fast Architecture for Video Multi-Modal Large Language Models by Min Shi, Shihao Wang, Chieh-Yun Chen, Jitesh Jain, K...

Arxiv paper - TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes

04 Apr 2025

Contributed by Lukas

In this episode, we discuss TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes by Nikai Du, Zhennan Chen, Zhizhou Chen, Shan Ga...

Arxiv paper - VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

01 Apr 2025

Contributed by Lukas

In this episode, we discuss VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning by Ye Liu, Kevin Qinghong Lin, Chang Wen Chen, Mike Zheng Shou. ...

Arxiv paper - SynCity: Training-Free Generation of 3D Worlds

28 Mar 2025

Contributed by Lukas

In this episode, we discuss SynCity: Training-Free Generation of 3D Worlds by Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, A...

Arxiv paper - HD-EPIC: A Highly-Detailed Egocentric Video Dataset

26 Mar 2025

Contributed by Lukas

In this episode, we discuss HD-EPIC: A Highly-Detailed Egocentric Video Dataset by Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pol...

Arxiv paper - Video-T1: Test-Time Scaling for Video Generation

25 Mar 2025

Contributed by Lukas

In this episode, we discuss Video-T1: Test-Time Scaling for Video Generation by Fangfu Liu, Hanyang Wang, Yimo Cai, Kaiyan Zhang, Xiaohang Zhan, Yueqi...

Arxiv paper - Calibrated Multi-Preference Optimization for Aligning Diffusion Models

24 Mar 2025

Contributed by Lukas

In this episode, we discuss Calibrated Multi-Preference Optimization for Aligning Diffusion Models by Kyungmin Lee, Xiaohang Li, Qifei Wang, Junfeng H...

Arxiv paper - Personalize Anything for Free with Diffusion Transformer

21 Mar 2025

Contributed by Lukas

In this episode, we discuss Personalize Anything for Free with Diffusion Transformer by Haoran Feng, Zehuan Huang, Lin Li, Hairong Lv, Lu Sheng. The p...

Arxiv paper - Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

20 Mar 2025

Contributed by Lukas

In this episode, we discuss Story-Adapter: A Training-free Iterative Framework for Long Story Visualization by Jiawei Mao, Xiaoke Huang, Yunfei Xie, Y...

Arxiv paper - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

18 Mar 2025

Contributed by Lukas

In this episode, we discuss ReCamMaster: Camera-Controlled Generative Rendering from A Single Video by Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang...

Arxiv paper - Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

17 Mar 2025

Contributed by Lukas

In this episode, we discuss Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models by Wenxuan Huang, Bohan Jia, Zijie Zhai,...

Arxiv paper - MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

13 Mar 2025

Contributed by Lukas

In this episode, we discuss MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks by Jiacheng Chen, Tianhao Liang, Sherman Siu, Zheng...

Arxiv paper - TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

12 Mar 2025

Contributed by Lukas

In this episode, we discuss TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models by Mark YU, Wenbo Hu, Jinbo Xin...

Arxiv paper - PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

11 Mar 2025

Contributed by Lukas

In this episode, we discuss PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving by Mihir P...

Activity Overview

Episodes

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Mode Seeking meets Mean Seeking for Fast Long Video Generation

Recursive Language Models

PaperBanana: Automating Academic Illustration for AI Scientists

World-Gymnast: Training Robots with Reinforcement Learning in a World Model

Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

Self-Rewarding Language Models

On the generalization of language models from in-context learning and finetuning: a controlled study

OpenThoughts: Data Recipes for Reasoning Models

Nested Learning: The Illusion of Deep Learning Architecture

ARC Is a Vision Problem!

Solving a Million-Step LLM Task with Zero Errors

DataRater: Meta-Learned Dataset Curation

Mathematical exploration and discovery at scale

Kosmos: An AI Scientist for Autonomous Discovery

World Simulation with Video Foundation Models for Physical AI

Towards Robust Mathematical Reasoning

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models

ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Reasoning with Sampling: Your Base Model is Smarter Than You Think

DeepSeek-OCR: Contexts Optical Compression

The Markovian Thinker

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

Towards a Physics Foundation Model

Scalable Option Learning in High-Throughput Environments

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Reverse-Engineered Reasoning for Open-Ended Generation

Scaling Performance of Large Language Model Pretraining

General Social Agents

We need a new ethics for a world of AI agents

Hierarchical Reasoning Model

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

Small Language Models are the Future of Agentic AI

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

Why Language Models Hallucinate

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards

Working with AI: Measuring the Occupational Implications of Generative AI

Towards physician-centered oversight of conversational diagnostic AI

Learning without training: The implicit dynamics of in-context learning

Aime: Towards Fully-Autonomous Multi-Agent Framework

ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation

4KAgent: Agentic Any Image to 4K Super-Resolution

Critiques of World Models

Arxiv paper - Expert-level validation of AI-generated medical text with scalable language models

Arxiv paper - ImplicitQA: Going beyond frames towards Implicit Video Reasoning

Arxiv paper - BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Arxiv paper - Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory

Blogpost paper - Project Vend: Can Claude run a small shop? (And why does that matter?)

Arxiv paper - Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

Arxiv paper - SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Arxiv paper - OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Arxiv paper - Long-Context State-Space Video World Models

Arxiv paper - From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Arxiv paper - Reinforcement Pre-Training

Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs

Arxiv paper - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Arxiv paper - How much do language models memorize?

Arxiv paper - MMaDA: Multimodal Large Diffusion Language Models

Arxiv paper - Superhuman performance of a large language model on the reasoning tasks of a physician

Arxiv paper - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Arxiv paper - DanceGRPO: Unleashing GRPO on Visual Generation

Arxiv paper - Visual Planning: Let’s Think Only with Images

Arxiv paper - A Preliminary Study for GPT-4o on Image Restoration

Arxiv paper - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion

Arxiv paper - RayZer: A Self-supervised Large View Synthesis Model

Arxiv paper - Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Arxiv paper - MINERVA: Evaluating Complex Video Reasoning

Arxiv paper - The Leaderboard Illusion

Arxiv paper - Towards Understanding Camera Motions in Any Video

Arxiv paper - Describe Anything: Detailed Localized Image and Video Captioning