AI Breakdown
Episodes
Arxiv paper - HunyuanVideo: A Systematic Framework For Large Video Generative Models
12 Feb 2025
Contributed by Lukas
In this episode, we discuss HunyuanVideo: A Systematic Framework For Large Video Generative Models by Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuo...
Arxiv paper - s1: Simple test-time scaling
10 Feb 2025
Contributed by Lukas
In this episode, we discuss s1: Simple test-time scaling by Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirz...
Arxiv paper - Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
07 Feb 2025
Contributed by Lukas
In this episode, we discuss Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation by The authors of the paper are ...
Arxiv paper - MatAnyone: Stable Video Matting with Consistent Memory Propagation
07 Feb 2025
Contributed by Lukas
In this episode, we discuss MatAnyone: Stable Video Matting with Consistent Memory Propagation by Peiqing Yang, Shangchen Zhou, Jixin Zhao, Qingyi Tao...
Arxiv paper - Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
03 Feb 2025
Contributed by Lukas
In this episode, we discuss Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate by Yubo Wang, Xiang Yue, Wenhu Chen....
Arxiv paper - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
31 Jan 2025
Contributed by Lukas
In this episode, we discuss Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs by Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xing...
Arxiv paper - MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
30 Jan 2025
Contributed by Lukas
In this episode, we discuss MetaMorph: Multimodal Understanding and Generation via Instruction Tuning by Shengbang Tong, David Fan, Jiachen Zhu, Yunya...
Arxiv paper - Improving Video Generation with Human Feedback
29 Jan 2025
Contributed by Lukas
In this episode, we discuss Improving Video Generation with Human Feedback by Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zhen...
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
28 Jan 2025
Contributed by Lukas
In this episode, we discuss Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling by The authors of the paper are: - ...
Arxiv paper - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
27 Jan 2025
Contributed by Lukas
In this episode, we discuss DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning by DeepSeek-AI. The paper introduces De...
Arxiv paper - Can We Generate Images with CoT? Let’s Verify and Reinforce Image Generation Step by Step
24 Jan 2025
Contributed by Lukas
In this episode, we discuss Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step by Ziyu Guo, Renrui Zhang, Cheng...
Arxiv paper - Improving Factuality with Explicit Working Memory
23 Jan 2025
Contributed by Lukas
In this episode, we discuss Improving Factuality with Explicit Working Memory by Mingda Chen, Yang Li, Karthik Padthe, Rulin Shao, Alicia Sun, Luke Ze...
Arxiv paper - Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
17 Jan 2025
Contributed by Lukas
In this episode, we discuss Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control by Zekai Gu, Rui Yan, Jiahao Lu, Peng...
Arxiv paper - FaceLift: Single Image to 3D Head with View Generation and GS-LRM
13 Jan 2025
Contributed by Lukas
In this episode, we discuss FaceLift: Single Image to 3D Head with View Generation and GS-LRM by Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, Zhixin Shu. Fac...
Arxiv paper - GenHMR: Generative Human Mesh Recovery
08 Jan 2025
Contributed by Lukas
In this episode, we discuss GenHMR: Generative Human Mesh Recovery by Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Srijan Das...
Arxiv paper - Video Creation by Demonstration
06 Jan 2025
Contributed by Lukas
In this episode, we discuss Video Creation by Demonstration by Yihong Sun, Hao Zhou, Liangzhe Yuan, Jennifer J. Sun, Yandong Li, Xuhui Jia, Hartwig Ad...
Arxiv paper - Byte Latent Transformer: Patches Scale Better Than Tokens
02 Jan 2025
Contributed by Lukas
In this episode, we discuss Byte Latent Transformer: Patches Scale Better Than Tokens by Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen,...
Arxiv paper - Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
17 Dec 2024
Contributed by Lukas
In this episode, we discuss Align3R: Aligned Monocular Depth Estimation for Dynamic Videos by Jiahao Lu, Tianyu Huang, Peng Li, Zhiyang Dou, Cheng Lin...
Arxiv paper - FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
17 Dec 2024
Contributed by Lukas
In this episode, we discuss FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion by Haonan Qiu, Shiwei Zhang, Yujie W...
Arxiv paper - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
11 Dec 2024
Contributed by Lukas
In this episode, we discuss ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis by Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo...
Arxiv paper - o1-Coder: an o1 Replication for Coding
10 Dec 2024
Contributed by Lukas
In this episode, we discuss o1-Coder: an o1 Replication for Coding by Yuxiang Zhang, Shangxi Wu, Yuqi Yang, Jiangming Shu, Jinlin Xiao, Chao Kong, Jit...
Arxiv paper - DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
06 Dec 2024
Contributed by Lukas
In this episode, we discuss DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning by Hao Bai, Yifei Zhou, Mert Cem...
ICLR 2025 submission - CYCLE-CONSISTENT LEARNING FOR JOINT LAYOUT-TO-IMAGE GENERATION AND OBJECT DETECTION
03 Dec 2024
Contributed by Lukas
In this episode, we discuss CYCLE-CONSISTENT LEARNING FOR JOINT LAYOUT-TO-IMAGE GENERATION AND OBJECT DETECTION by The paper's authors are listed as "...
Arxiv Paper - WonderWorld: Interactive 3D Scene Generation from a Single Image
26 Nov 2024
Contributed by Lukas
In this episode, we discuss WonderWorld: Interactive 3D Scene Generation from a Single Image by Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T....
Arxiv Paper - Hymba: A Hybrid-head Architecture for Small Language Models
22 Nov 2024
Contributed by Lukas
In this episode, we discuss Hymba: A Hybrid-head Architecture for Small Language Models by Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen...
Arxiv Paper - Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
21 Nov 2024
Contributed by Lukas
In this episode, we discuss Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation by Danny Halawi, Alexander Wei, Eric Wallace, Tony ...
Arxiv Paper - Video Instruction Tuning With Synthetic Data
20 Nov 2024
Contributed by Lukas
In this episode, we discuss Video Instruction Tuning With Synthetic Data by Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li...
Arxiv Paper - Generative Agent Simulations of 1,000 People
19 Nov 2024
Contributed by Lukas
In this episode, we discuss Generative Agent Simulations of 1,000 People by Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai...
NeurIPS 2024 - Moving Off-the-Grid: Scene-Grounded Video Representations
15 Nov 2024
Contributed by Lukas
In this episode, we discuss Moving Off-the-Grid: Scene-Grounded Video Representations by Sjoerd van Steenkiste, Daniel Zoran, Yi Yang, Yulia Rubanova,...
Arxiv Paper - Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
14 Nov 2024
Contributed by Lukas
In this episode, we discuss Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution by Peng Wang, Shuai Bai, Sinan Tan, ...
Arxiv Paper - FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
13 Nov 2024
Contributed by Lukas
In this episode, we discuss FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality by Zhengyao Lv, Chenyang Si, Junhao Song, ...
Arxiv Paper - Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
11 Nov 2024
Contributed by Lukas
In this episode, we discuss Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA by Sangmin Bae, Adam Fisch, Hrayr Harutyu...
Arxiv Paper - Long Context RAG Performance of Large Language Models
08 Nov 2024
Contributed by Lukas
In this episode, we discuss Long Context RAG Performance of Large Language Models by Quinn Leng, Jacob Portes, Sam Havens, Matei Zaharia, Michael Carb...
Arxiv Paper - NVLM: Open Frontier-Class Multimodal LLMs
05 Nov 2024
Contributed by Lukas
In this episode, we discuss NVLM: Open Frontier-Class Multimodal LLMs by Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuolin Yang, Zihan Liu, Jon Barker, Tu...
Arxiv Paper - ColPali: Efficient Document Retrieval with Vision Language Models
01 Nov 2024
Contributed by Lukas
In this episode, we discuss ColPali: Efficient Document Retrieval with Vision Language Models by Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani,...
Arxiv Paper - Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
31 Oct 2024
Contributed by Lukas
In this episode, we discuss Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models by Matt Deitke, Christopher Clark, Sang...
Arxiv Paper - Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
31 Oct 2024
Contributed by Lukas
In this episode, we discuss Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization by Mohammad Samragh, Iman Mi...
Arxiv Paper - Unbounded: A Generative Infinite Game of Character Life Simulation
29 Oct 2024
Contributed by Lukas
In this episode, we discuss Unbounded: A Generative Infinite Game of Character Life Simulation by Jialu Li, Yuanzhen Li, Neal Wadhwa, Yael Pritch, Dav...
Arxiv Paper - Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
28 Oct 2024
Contributed by Lukas
In this episode, we discuss Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer? by Nishant Balepur, Feng Gu...
Arxiv Paper - LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
25 Oct 2024
Contributed by Lukas
In this episode, we discuss LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding by Xiaoqian Shen, Yunyang Xiong, Changsh...
Arxiv Paper - When Does Perceptual Alignment Benefit Vision Representations?
23 Oct 2024
Contributed by Lukas
In this episode, we discuss When Does Perceptual Alignment Benefit Vision Representations? by Shobhita Sundaram, Stephanie Fu, Lukas Muttenthaler, Net...
Arxiv paper - SceneCraft: Layout-Guided 3D Scene Generation
22 Oct 2024
Contributed by Lukas
In this episode, we discuss SceneCraft: Layout-Guided 3D Scene Generation by Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang. SceneCraft is a metho...
arxiv preprint - A Tale of Tails: Model Collapse as a Change of Scaling Laws
18 Oct 2024
Contributed by Lukas
In this episode, we discuss A Tale of Tails: Model Collapse as a Change of Scaling Laws by Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Ju...
arxiv preprint - Thinking LLMs: General Instruction Following with Thought Generation
17 Oct 2024
Contributed by Lukas
In this episode, we discuss Thinking LLMs: General Instruction Following with Thought Generation by Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao,...
arxiv preprint - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
16 Oct 2024
Contributed by Lukas
In this episode, we discuss Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think by Sihyun Yu, Sangkyung ...
arxiv preprint - F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
14 Oct 2024
Contributed by Lukas
In this episode, we discuss F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching by Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi...
arxiv preprint - One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
11 Oct 2024
Contributed by Lukas
In this episode, we discuss One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation by Fabian Paischer, Lukas Hauzenberger,...
arxiv preprint - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
10 Oct 2024
Contributed by Lukas
In this episode, we discuss Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models by Seyedmorteza Sadat, Otmar Hilliges...
arxiv preprint - NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING
07 Oct 2024
Contributed by Lukas
In this episode, we discuss NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING by The authors of the paper "NEPTUNE: THE LONG ORBIT TO B...
arxiv preprint - SHIC: Shape-Image Correspondences with no Keypoint Supervision
04 Oct 2024
Contributed by Lukas
In this episode, we discuss SHIC: Shape-Image Correspondences with no Keypoint Supervision by Aleksandar Shtedritski, Christian Rupprecht, Andrea Veda...
arxiv preprint - E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
03 Oct 2024
Contributed by Lukas
In this episode, we discuss E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding by Ye Liu, Zongyang Ma, Zhongang Qi, Yang Wu, Ying...
arxiv preprint - LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
01 Oct 2024
Contributed by Lukas
In this episode, we discuss LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness by Chenming Zhu, Tai Wang, Wenwei Zhang, Jia...
arxiv preprint - DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
28 Sep 2024
Contributed by Lukas
In this episode, we discuss DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos by Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie...
arxiv preprint - Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
27 Sep 2024
Contributed by Lukas
In this episode, we discuss Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale by Fan Zhou, Zengzhi Wang, Qian Liu, Ju...
arxiv preprint - Phantom of Latent for Large Language and Vision Models
24 Sep 2024
Contributed by Lukas
In this episode, we discuss Phantom of Latent for Large Language and Vision Models by Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong...
arxiv preprint - Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
20 Sep 2024
Contributed by Lukas
In this episode, we discuss Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think by Gonzalo Martin Garcia, Karim Abou Zeid, Christi...
arxiv preprint - On the Diagram of Thought
19 Sep 2024
Contributed by Lukas
In this episode, we discuss On the Diagram of Thought by Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao. Diagram of Thought (DoT) is a framework for mode...
arxiv preprint - Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
17 Sep 2024
Contributed by Lukas
In this episode, we discuss Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources by Alisia Lupidi, Carlos Gemmell, Nicol...
arxiv preprint - SongCreator: Lyrics-based Universal Song Generation
12 Sep 2024
Contributed by Lukas
In this episode, we discuss SongCreator: Lyrics-based Universal Song Generation by Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu ...
arxiv preprint - Achieving Human Level Competitive Robot Table Tennis
11 Sep 2024
Contributed by Lukas
In this episode, we discuss Achieving Human Level Competitive Robot Table Tennis by David B. D'Ambrosio, Saminda Abeyruwan, Laura Graesser, Atil Iscen...
arxiv preprint - Sapiens: Foundation for Human Vision Models
09 Sep 2024
Contributed by Lukas
In this episode, we discuss Sapiens: Foundation for Human Vision Models by Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Su Zhaoen, Austin Jam...
arxiv preprint - Re-Reading Improves Reasoning in Large Language Models
06 Sep 2024
Contributed by Lukas
In this episode, we discuss Re-Reading Improves Reasoning in Large Language Models by Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong ...
arxiv preprint - SPIRE: Semantic Prompt-Driven Image Restoration
03 Sep 2024
Contributed by Lukas
In this episode, we discuss SPIRE: Semantic Prompt-Driven Image Restoration by Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanf...
arxiv preprint - Automated Design of Agentic Systems
31 Aug 2024
Contributed by Lukas
In this episode, we discuss Automated Design of Agentic Systems by Shengran Hu, Cong Lu, Jeff Clune. The paper introduces Automated Design of Agentic ...
arxiv preprint - Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
28 Aug 2024
Contributed by Lukas
In this episode, we discuss Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model by Chunting Zhou, Lili Yu, Arun Babu, Ku...
arxiv preprint - To Code, or Not To Code? Exploring Impact of Code in Pre-training
26 Aug 2024
Contributed by Lukas
In this episode, we discuss To Code, or Not To Code? Exploring Impact of Code in Pre-training by Viraat Aryabumi, Yixuan Su, Raymond Ma, Adrien Moriso...
arxiv preprint - Segment Anything with Multiple Modalities
23 Aug 2024
Contributed by Lukas
In this episode, we discuss Segment Anything with Multiple Modalities by Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Naoto Yokoya, Shijian Lu. The pap...
arxiv preprint - JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
20 Aug 2024
Contributed by Lukas
In this episode, we discuss JPEG-LM: LLMs as Image Generators with Canonical Codec Representations by Xiaochuang Han, Marjan Ghazvininejad, Pang Wei K...
arxiv preprint - Mission: Impossible Language Models
19 Aug 2024
Contributed by Lukas
In this episode, we discuss Mission: Impossible Language Models by Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Po...
arxiv preprint - Learning Task Decomposition to Assist Humans in Competitive Programming
16 Aug 2024
Contributed by Lukas
In this episode, we discuss Learning Task Decomposition to Assist Humans in Competitive Programming by Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, ...
arxiv preprint - IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
13 Aug 2024
Contributed by Lukas
In this episode, we discuss IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts by Ciara Rowles, Shimon Vainer,...
arxiv preprint - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
10 Aug 2024
Contributed by Lukas
In this episode, we discuss Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters by Charlie Snell, Jaehoon Lee,...
arxiv preprint - Language Model Can Listen While Speaking
09 Aug 2024
Contributed by Lukas
In this episode, we discuss Language Model Can Listen While Speaking by Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan ...
arxiv preprint - Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
07 Aug 2024
Contributed by Lukas
In this episode, we discuss Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning by Trapoom Ukarapol, Zhicheng Lee, Amy...
arxiv preprint - Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
06 Aug 2024
Contributed by Lukas
In this episode, we discuss Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle by Zhenyu Tang, Junwu Zhan...
arxiv preprint - Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
06 Aug 2024
Contributed by Lukas
In this episode, we discuss Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent by Shanbo Cheng, Zhichao Huang,...
arxiv preprint - Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
31 Jul 2024
Contributed by Lukas
In this episode, we discuss Graph-enhanced Large Language Models in Asynchronous Plan Reasoning by Fangru Lin, Emanuele La Malfa, Valentin Hofmann, El...
arxiv preprint - LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
30 Jul 2024
Contributed by Lukas
In this episode, we discuss LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference by Qichen Fu, Minsik Cho, Thomas Merth, Sachin Meh...
arxiv preprint - OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person
29 Jul 2024
Contributed by Lukas
In this episode, we discuss OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person by Ke Sun, Jian Cao, Qi Wang, Linrui Tian,...
arxiv preprint - DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
27 Jul 2024
Contributed by Lukas
In this episode, we discuss DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM by Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenh...
arxiv preprint - Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
23 Jul 2024
Contributed by Lukas
In this episode, we discuss Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning by Kaiwen Wang, Rahul Kidambi, R...
arxiv preprint - Chameleon: Mixed-Modal Early-Fusion Foundation Models
22 Jul 2024
Contributed by Lukas
In this episode, we discuss Chameleon: Mixed-Modal Early-Fusion Foundation Models by Chameleon Team. The paper introduces Chameleon, a family of model...
arxiv preprint - Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
18 Jul 2024
Contributed by Lukas
In this episode, we discuss Goldfish: Vision-Language Understanding of Arbitrarily Long Videos by Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, ...
arxiv preprint - Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
17 Jul 2024
Contributed by Lukas
In this episode, we discuss Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity by Santiago Pascual, Chunghsin Yeh, Ioannis Tsia...
arxiv preprint - Human-like Episodic Memory for Infinite Context LLMs
15 Jul 2024
Contributed by Lukas
In this episode, we discuss Human-like Episodic Memory for Infinite Context LLMs by Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee, Fenia Chri...
arxiv preprint - Learning to (Learn at Test Time): RNNs with Expressive Hidden States
12 Jul 2024
Contributed by Lukas
In this episode, we discuss Learning to (Learn at Test Time): RNNs with Expressive Hidden States by Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun V...
arxiv preprint - Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
11 Jul 2024
Contributed by Lukas
In this episode, we discuss Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions by Yu-Guan Hsieh, Cheng-Yu Hsieh,...
arxiv preprint - Evaluating Human Alignment and Model Faithfulness of LLM Rationale
09 Jul 2024
Contributed by Lukas
In this episode, we discuss Evaluating Human Alignment and Model Faithfulness of LLM Rationale by Mohsen Fayyaz, Fan Yin, Jiao Sun, Nanyun Peng. The p...
arxiv preprint - Detection and Measurement of Syntactic Templates in Generated Text
08 Jul 2024
Contributed by Lukas
In this episode, we discuss Detection and Measurement of Syntactic Templates in Generated Text by Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C...
arxiv preprint - From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
01 Jul 2024
Contributed by Lukas
In this episode, we discuss From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data by Zhe...
arxiv preprint - MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
27 Jun 2024
Contributed by Lukas
In this episode, we discuss MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning by Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yin...
arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
26 Jun 2024
Contributed by Lukas
In this episode, we discuss 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities by Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, A...
arxiv preprint - VideoLLM-online: Online Video Large Language Model for Streaming Video
25 Jun 2024
Contributed by Lukas
In this episode, we discuss VideoLLM-online: Online Video Large Language Model for Streaming Video by Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghon...
arxiv preprint - EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
24 Jun 2024
Contributed by Lukas
In this episode, we discuss EvTexture: Event-driven Texture Enhancement for Video Super-Resolution by Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun....
arxiv preprint - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
22 Jun 2024
Contributed by Lukas
In this episode, we discuss MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model by...
arxiv preprint - An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
20 Jun 2024
Contributed by Lukas
In this episode, we discuss An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels by Duy-Kien Nguyen, Mahmoud Assran,...
arxiv preprint - Graphic Design with Large Multimodal Model
20 Jun 2024
Contributed by Lukas
In this episode, we discuss Graphic Design with Large Multimodal Model by Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie ...
arxiv preprint - LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
18 Jun 2024
Contributed by Lukas
In this episode, we discuss LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning by Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quen...
arxiv preprint - Transformers need glasses! Information over-squashing in language tasks
17 Jun 2024
Contributed by Lukas
In this episode, we discuss Transformers need glasses! Information over-squashing in language tasks by Federico Barbero, Andrea Banino, Steven Kapturo...
arxiv preprint - Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback
14 Jun 2024
Contributed by Lukas
In this episode, we discuss Show, Don't Tell: Aligning Language Models with Demonstrated Feedback by Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao...