AI Breakdown

arxiv preprint - MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

27 Jun 2024

Contributed by Lukas

In this episode, we discuss MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning by Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yin...

arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

26 Jun 2024

Contributed by Lukas

In this episode, we discuss 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities by Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, A...

arxiv preprint - VideoLLM-online: Online Video Large Language Model for Streaming Video

25 Jun 2024

Contributed by Lukas

In this episode, we discuss VideoLLM-online: Online Video Large Language Model for Streaming Video by Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghon...

arxiv preprint - EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

24 Jun 2024

Contributed by Lukas

In this episode, we discuss EvTexture: Event-driven Texture Enhancement for Video Super-Resolution by Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun....

arxiv preprint - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

22 Jun 2024

Contributed by Lukas

In this episode, we discuss MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model by...

arxiv preprint - An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

20 Jun 2024

Contributed by Lukas

In this episode, we discuss An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels by Duy-Kien Nguyen, Mahmoud Assran,...

arxiv preprint - Graphic Design with Large Multimodal Model

20 Jun 2024

Contributed by Lukas

In this episode, we discuss Graphic Design with Large Multimodal Model by Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie ...

arxiv preprint - LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

18 Jun 2024

Contributed by Lukas

In this episode, we discuss LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning by Dantong Niu, Yuvan Sharma, Giscard Biamby, Jerome Quen...

arxiv preprint - Transformers need glasses! Information over-squashing in language tasks

17 Jun 2024

Contributed by Lukas

In this episode, we discuss Transformers need glasses! Information over-squashing in language tasks by Federico Barbero, Andrea Banino, Steven Kapturo...

arxiv preprint - Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

14 Jun 2024

Contributed by Lukas

In this episode, we discuss Show, Don't Tell: Aligning Language Models with Demonstrated Feedback by Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao...

arxiv preprint - TextGrad: Automatic ”Differentiation” via Text

13 Jun 2024

Contributed by Lukas

In this episode, we discuss TextGrad: Automatic "Differentiation" via Text by Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, C...

arxiv preprint - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

12 Jun 2024

Contributed by Lukas

In this episode, we discuss SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales by Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoz...

arxiv preprint - Open-Endedness is Essential for Artificial Superhuman Intelligence

11 Jun 2024

Contributed by Lukas

In this episode, we discuss Open-Endedness is Essential for Artificial Superhuman Intelligence by Edward Hughes, Michael Dennis, Jack Parker-Holder, F...

arxiv preprint - To Believe or Not to Believe Your LLM

08 Jun 2024

Contributed by Lukas

In this episode, we discuss To Believe or Not to Believe Your LLM by Yasin Abbasi Yadkori, Ilja Kuzborskij, András György, Csaba Szepesvári. The st...

arxiv preprint - Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

06 Jun 2024

Contributed by Lukas

In this episode, we discuss Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts by Chunjing Gan, Dan Y...

arxiv preprint - Contextual Position Encoding: Learning to Count What’s Important

04 Jun 2024

Contributed by Lukas

In this episode, we discuss Contextual Position Encoding: Learning to Count What's Important by Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar S...

arxiv preprint - Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

03 Jun 2024

Contributed by Lukas

In this episode, we discuss Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis by Chaoyou Fu, Yuhan Da...

arxiv preprint - VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

31 May 2024

Contributed by Lukas

In this episode, we discuss VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos by Ziyang Wang, Shoubin Yu, Elias Ste...

arxiv preprint - CinePile: A Long Video Question Answering Dataset and Benchmark

30 May 2024

Contributed by Lukas

In this episode, we discuss CinePile: A Long Video Question Answering Dataset and Benchmark by Ruchit Rawal, Khalid Saifullah, Ronen Basri, David Jaco...

arxiv preprint - Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

29 May 2024

Contributed by Lukas

In this episode, we discuss Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum by Hadi Pouransari, Chun-Liang Li, Jen...

arxiv preprint - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

28 May 2024

Contributed by Lukas

In this episode, we discuss SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering by John Yang, Carlos E. Jimenez, Alexander Wett...

arxiv preprint - Octo: An Open-Source Generalist Robot Policy

24 May 2024

Contributed by Lukas

In this episode, we discuss Octo: An Open-Source Generalist Robot Policy by Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier...

arxiv preprint - Layer-Condensed KV Cache for Efficient Inference of Large Language Models

23 May 2024

Contributed by Lukas

In this episode, we discuss Layer-Condensed KV Cache for Efficient Inference of Large Language Models by Haoyi Wu, Kewei Tu. The paper addresses the s...

arxiv preprint - Observational Scaling Laws and the Predictability of Language Model Performance

22 May 2024

Contributed by Lukas

In this episode, we discuss Observational Scaling Laws and the Predictability of Language Model Performance by Yangjun Ruan, Chris J. Maddison, Tatsun...

arxiv preprint - Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization

21 May 2024

Contributed by Lukas

In this episode, we discuss Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization by Costas Mavromatis, Petros Karypis, George Karypis. ...

arxiv preprint - The Platonic Representation Hypothesis

20 May 2024

Contributed by Lukas

In this episode, we discuss The Platonic Representation Hypothesis by Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola. The paper argues that ...

arxiv preprint - Many-Shot In-Context Learning in Multimodal Foundation Models

18 May 2024

Contributed by Lukas

In this episode, we discuss Many-Shot In-Context Learning in Multimodal Foundation Models by Yixing Jiang, Jeremy Irvin, Ji Hun Wang, Muhammad Ahmed C...

arxiv preprint - Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

16 May 2024

Contributed by Lukas

In this episode, we discuss Naturalistic Music Decoding from EEG Data via Latent Diffusion Models by Emilian Postolache, Natalia Polouliakh, Hiroaki K...

arxiv preprint - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

15 May 2024

Contributed by Lukas

In this episode, we discuss The Chosen One: Consistent Characters in Text-to-Image Diffusion Models by Omri Avrahami, Amir Hertz, Yael Vinker, Moab Ar...

arxiv preprint - Memory Mosaics

14 May 2024

Contributed by Lukas

In this episode, we discuss Memory Mosaics by Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan, Beidi Chen, Léon Bottou. Memory Mosaics are collective n...

arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

13 May 2024

Contributed by Lukas

In this episode, we discuss Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? by Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Ami...

arxiv preprint - LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

10 May 2024

Contributed by Lukas

In this episode, we discuss LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models by Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai,...

arxiv preprint - WildChat: 1M ChatGPT Interaction Logs in the Wild

09 May 2024

Contributed by Lukas

In this episode, we discuss WildChat: 1M ChatGPT Interaction Logs in the Wild by Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yunt...

arxiv preprint - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

08 May 2024

Contributed by Lukas

In this episode, we discuss Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models by Mosh Levy, Alo...

arxiv preprint - NOLA: Compressing LoRA using Linear Combination of Random Basis

07 May 2024

Contributed by Lukas

In this episode, we discuss NOLA: Compressing LoRA using Linear Combination of Random Basis by Soroush Abbasi Koohpayegani, KL Navaneet, Parsa Noorali...

arxiv preprint - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

06 May 2024

Contributed by Lukas

In this episode, we discuss StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation by Yupeng Zhou, Daquan Zhou, Ming-Ming...

arxiv preprint - Iterative Reasoning Preference Optimization

03 May 2024

Contributed by Lukas

In this episode, we discuss Iterative Reasoning Preference Optimization by Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaat...

arxiv preprint - Better & Faster Large Language Models via Multi-token Prediction

02 May 2024

Contributed by Lukas

In this episode, we discuss Better & Faster Large Language Models via Multi-token Prediction by Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozi...

arxiv preprint - Make Your LLM Fully Utilize the Context

01 May 2024

Contributed by Lukas

In this episode, we discuss Make Your LLM Fully Utilize the Context by Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou. The paper "Ma...

arxiv preprint - Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

30 Apr 2024

Contributed by Lukas

In this episode, we discuss Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation by Jaemin Cho, Yush...

arxiv preprint - PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

29 Apr 2024

Contributed by Lukas

In this episode, we discuss PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning by Lin Xu, Yilin Zhao, Daquan Zho...

arxiv preprint - Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

26 Apr 2024

Contributed by Lukas

In this episode, we discuss Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare by Emre Can Acikgoz, Osman Batur İ...

arxiv preprint - SnapKV: LLM Knows What You are Looking for Before Generation

25 Apr 2024

Contributed by Lukas

In this episode, we discuss SnapKV: LLM Knows What You are Looking for Before Generation by Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, A...

arxiv preprint - CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

24 Apr 2024

Contributed by Lukas

In this episode, we discuss CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models by Je-Yong Lee, Donghyun Lee, Genghan Zhang, M...

arxiv preprint - SpaceByte: Towards Deleting Tokenization from Large Language Modeling

23 Apr 2024

Contributed by Lukas

In this episode, we discuss SpaceByte: Towards Deleting Tokenization from Large Language Modeling by Kevin Slagle. Tokenization in large language mode...

arxiv preprint - TextSquare: Scaling up Text-Centric Visual Instruction Tuning

22 Apr 2024

Contributed by Lukas

In this episode, we discuss TextSquare: Scaling up Text-Centric Visual Instruction Tuning by Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong W...

arxiv preprint - EdgeFusion: On-Device Text-to-Image Generation

19 Apr 2024

Contributed by Lukas

In this episode, we discuss EdgeFusion: On-Device Text-to-Image Generation by Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeon...

arxiv preprint - VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

18 Apr 2024

Contributed by Lukas

In this episode, we discuss VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time by Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang,...

arxiv preprint - Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

17 Apr 2024

Contributed by Lukas

In this episode, we discuss Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models by Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhi...

arxiv preprint - High-Dimension Human Value Representation in Large Language Models

16 Apr 2024

Contributed by Lukas

In this episode, we discuss High-Dimension Human Value Representation in Large Language Models by Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila K...

arxiv preprint - Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

15 Apr 2024

Contributed by Lukas

In this episode, we discuss Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck by Nathan Godey, ...

arxiv preprint - Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

12 Apr 2024

Contributed by Lukas

In this episode, we discuss Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention by Tsendsuren Munkhdalai, Manaal Fa...

arxiv preprint - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

11 Apr 2024

Contributed by Lukas

In this episode, we discuss Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs by Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, A...

arxiv preprint - Evaluating Text-to-Visual Generation with Image-to-Text Generation

10 Apr 2024

Contributed by Lukas

In this episode, we discuss Evaluating Text-to-Visual Generation with Image-to-Text Generation by Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide...

arxiv preprint - Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

09 Apr 2024

Contributed by Lukas

In this episode, we discuss Future Lens: Anticipating Subsequent Tokens from a Single Hidden State by Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. W...

arxiv preprint - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

08 Apr 2024

Contributed by Lukas

In this episode, we discuss Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity by Soyeong Jeong, Ji...

arxiv preprint - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

05 Apr 2024

Contributed by Lukas

In this episode, we discuss Mixture-of-Depths: Dynamically allocating compute in transformer-based language models by David Raposo, Sam Ritter, Blake ...

arxiv preprint - WavLLM: Towards Robust and Adaptive Speech Large Language Model

04 Apr 2024

Contributed by Lukas

In this episode, we discuss WavLLM: Towards Robust and Adaptive Speech Large Language Model by Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun...

arxiv preprint - Gecko: Versatile Text Embeddings Distilled from Large Language Models

03 Apr 2024

Contributed by Lukas

In this episode, we discuss Gecko: Versatile Text Embeddings Distilled from Large Language Models by Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, ...

arxiv preprint - ReALM: Reference Resolution As Language Modeling

02 Apr 2024

Contributed by Lukas

In this episode, we discuss ReALM: Reference Resolution As Language Modeling by Joel Ruben Antony Moniz, Soundarya Krishnan, Melis Ozyildirim, Pratham...

arxiv preprint - sDPO: Don’t Use Your Data All at Once

01 Apr 2024

Contributed by Lukas

In this episode, we discuss sDPO: Don't Use Your Data All at Once by Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun...

arxiv preprint - LITA: Language Instructed Temporal-Localization Assistant

29 Mar 2024

Contributed by Lukas

In this episode, we discuss LITA: Language Instructed Temporal-Localization Assistant by De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yi...

arxiv preprint - AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

28 Mar 2024

Contributed by Lukas

In this episode, we discuss AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks by Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu...

arxiv preprint - InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

27 Mar 2024

Contributed by Lukas

In this episode, we discuss InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding by Yi Wang, Kunchang Li, Xinhao Li, Jiash...

arxiv preprint - Giraffe: Adventures in Expanding Context Lengths in LLMs

26 Mar 2024

Contributed by Lukas

In this episode, we discuss Giraffe: Adventures in Expanding Context Lengths in LLMs by Arka Pal, Deep Karkhanis, Manley Roberts, Samuel Dooley, Arvin...

arxiv preprint - Explorative Inbetweening of Time and Space

25 Mar 2024

Contributed by Lukas

In this episode, we discuss Explorative Inbetweening of Time and Space by Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Micha...

arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

22 Mar 2024

Contributed by Lukas

In this episode, we discuss Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking by Eric Zelikman, Georges Harik, Yijia Shao, Var...

arxiv preprint - Evaluating Large Language Models at Evaluating Instruction Following

21 Mar 2024

Contributed by Lukas

In this episode, we discuss Evaluating Large Language Models at Evaluating Instruction Following by Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tan...

arxiv preprint - Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

20 Mar 2024

Contributed by Lukas

In this episode, we discuss Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation by Se-eun Yoon, Zhankui H...

arxiv preprint - Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

19 Mar 2024

Contributed by Lukas

In this episode, we discuss Branch-Solve-Merge Improves Large Language Model Evaluation and Generation by Swarnadeep Saha, Omer Levy, Asli Celikyilmaz...

arxiv preprint - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

18 Mar 2024

Contributed by Lukas

In this episode, we discuss MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training by Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconn...

arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

15 Mar 2024

Contributed by Lukas

In this episode, we discuss Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking by Eric Zelikman, Georges Harik, Yijia Shao, Var...

arxiv preprint - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

14 Mar 2024

Contributed by Lukas

In this episode, we discuss WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? by Alexandre Drouin, Maxime Gasse, Massimo C...

arxiv preprint - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

13 Mar 2024

Contributed by Lukas

In this episode, we discuss Synth 2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings by Sahand Sharifzadeh, Christos Kapl...

arxiv preprint - Is Cosine-Similarity of Embeddings Really About Similarity?

12 Mar 2024

Contributed by Lukas

In this episode, we discuss Is Cosine-Similarity of Embeddings Really About Similarity? by Harald Steck, Chaitanya Ekanadham, Nathan Kallus. The paper...

arxiv preprint - A Generative Approach for Wikipedia-Scale Visual Entity Recognition

11 Mar 2024

Contributed by Lukas

In this episode, we discuss A Generative Approach for Wikipedia-Scale Visual Entity Recognition by Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordeli...

arxiv preprint - Self-correcting LLM-controlled Diffusion Models

08 Mar 2024

Contributed by Lukas

In this episode, we discuss Self-correcting LLM-controlled Diffusion Models by Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell. T...

arxiv preprint - tinyBenchmarks: evaluating LLMs with fewer examples

08 Mar 2024

Contributed by Lukas

In this episode, we discuss tinyBenchmarks: evaluating LLMs with fewer examples by Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun ...

arxiv preprint - Asymmetry in Low-Rank Adapters of Foundation Models

06 Mar 2024

Contributed by Lukas

In this episode, we discuss Asymmetry in Low-Rank Adapters of Foundation Models by Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Oc...

arxiv preprint - When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

05 Mar 2024

Contributed by Lukas

In this episode, we discuss When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method by Biao Zhang, Zhongtao Liu, Colin Cher...

arxiv preprint - EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

04 Mar 2024

Contributed by Lukas

In this episode, we discuss EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions b...

arxiv preprint - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

01 Mar 2024

Contributed by Lukas

In this episode, we discuss The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits by Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhu...

arxiv preprint - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

29 Feb 2024

Contributed by Lukas

In this episode, we discuss Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models by Yijia Shao, Yucheng Jiang, Theodor...

arxiv preprint - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

28 Feb 2024

Contributed by Lukas

In this episode, we discuss LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning by Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang,...

arxiv preprint - Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

27 Feb 2024

Contributed by Lukas

In this episode, we discuss Branch-Solve-Merge Improves Large Language Model Evaluation and Generation by Swarnadeep Saha, Omer Levy, Asli Celikyilmaz...

arxiv preprint - SciMON: Scientific Inspiration Machines Optimized for Novelty

26 Feb 2024

Contributed by Lukas

In this episode, we discuss SciMON: Scientific Inspiration Machines Optimized for Novelty by Qingyun Wang, Doug Downey, Heng Ji, Tom Hope. The paper p...

arxiv preprint - Speculative Streaming: Fast LLM Inference without Auxiliary Models

23 Feb 2024

Contributed by Lukas

In this episode, we discuss Speculative Streaming: Fast LLM Inference without Auxiliary Models by Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry...

arxiv preprint - LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

22 Feb 2024

Contributed by Lukas

In this episode, we discuss LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models by Yanwei Li, Chengyao Wang, Jiaya Jia. The paper introduce...

arxiv preprint - UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities

21 Feb 2024

Contributed by Lukas

In this episode, we discuss UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities by Hejia Geng, Boxun Xu, Peng...

arxiv preprint - Guiding Instruction-based Image Editing via Multimodal Large Language Models

20 Feb 2024

Contributed by Lukas

In this episode, we discuss Guiding Instruction-based Image Editing via Multimodal Large Language Models by Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William ...

arxiv preprint - Spectral State Space Models

16 Feb 2024

Contributed by Lukas

In this episode, we discuss Spectral State Space Models by Naman Agarwal, Daniel Suo, Xinyi Chen, Elad Hazan. The paper introduces a new type of state...

arxiv preprint - More Agents Is All You Need

15 Feb 2024

Contributed by Lukas

In this episode, we discuss More Agents Is All You Need by Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye. The study demonstrates that the effe...

arxiv preprint - World Model on Million-Length Video And Language With RingAttention

14 Feb 2024

Contributed by Lukas

In this episode, we discuss World Model on Million-Length Video And Language With RingAttention by Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel. ...

arxiv preprint - Learning Video Representations from Large Language Models

13 Feb 2024

Contributed by Lukas

In this episode, we discuss Learning Video Representations from Large Language Models by Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar. T...

arxiv preprint - Can Large Language Models Understand Context?

12 Feb 2024

Contributed by Lukas

In this episode, we discuss Can Large Language Models Understand Context? by Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Pi...

arxiv preprint - Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

09 Feb 2024

Contributed by Lukas

In this episode, we discuss Long Story Short: a Summarize-then-Search Method for Long Video Question Answering by Jiwan Chung, Youngjae Yu. The paper ...

arxiv preprint - System 2 Attention (is something you might need too)

08 Feb 2024

Contributed by Lukas

In this episode, we discuss System 2 Attention (is something you might need too) by Jason Weston, Sainbayar Sukhbaatar. The paper introduces System 2 ...

arxiv preprint - DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

07 Feb 2024

Contributed by Lukas

In this episode, we discuss DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models by Zhihong Shao, Peiyi Wang, Qihao Zhu,...

arxiv preprint - KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

06 Feb 2024

Contributed by Lukas

In this episode, we discuss KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization by Coleman Hooper, Sehoon Kim, Hiva Mo...

arxiv preprint - Language Model Inversion

05 Feb 2024

Contributed by Lukas

In this episode, we discuss Language Model Inversion by John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush. The paper e...

Activity Overview

Episodes

arxiv preprint - MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

arxiv preprint - VideoLLM-online: Online Video Large Language Model for Streaming Video

arxiv preprint - EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

arxiv preprint - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

arxiv preprint - An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

arxiv preprint - Graphic Design with Large Multimodal Model

arxiv preprint - LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

arxiv preprint - Transformers need glasses! Information over-squashing in language tasks

arxiv preprint - Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

arxiv preprint - TextGrad: Automatic ”Differentiation” via Text

arxiv preprint - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

arxiv preprint - Open-Endedness is Essential for Artificial Superhuman Intelligence

arxiv preprint - To Believe or Not to Believe Your LLM

arxiv preprint - Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

arxiv preprint - Contextual Position Encoding: Learning to Count What’s Important

arxiv preprint - Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

arxiv preprint - VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

arxiv preprint - CinePile: A Long Video Question Answering Dataset and Benchmark

arxiv preprint - Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

arxiv preprint - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

arxiv preprint - Octo: An Open-Source Generalist Robot Policy

arxiv preprint - Layer-Condensed KV Cache for Efficient Inference of Large Language Models

arxiv preprint - Observational Scaling Laws and the Predictability of Language Model Performance

arxiv preprint - Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization

arxiv preprint - The Platonic Representation Hypothesis

arxiv preprint - Many-Shot In-Context Learning in Multimodal Foundation Models

arxiv preprint - Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

arxiv preprint - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

arxiv preprint - Memory Mosaics

arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

arxiv preprint - LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

arxiv preprint - WildChat: 1M ChatGPT Interaction Logs in the Wild

arxiv preprint - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

arxiv preprint - NOLA: Compressing LoRA using Linear Combination of Random Basis

arxiv preprint - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

arxiv preprint - Iterative Reasoning Preference Optimization

arxiv preprint - Better & Faster Large Language Models via Multi-token Prediction

arxiv preprint - Make Your LLM Fully Utilize the Context

arxiv preprint - Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

arxiv preprint - PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

arxiv preprint - Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

arxiv preprint - SnapKV: LLM Knows What You are Looking for Before Generation

arxiv preprint - CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

arxiv preprint - SpaceByte: Towards Deleting Tokenization from Large Language Modeling

arxiv preprint - TextSquare: Scaling up Text-Centric Visual Instruction Tuning

arxiv preprint - EdgeFusion: On-Device Text-to-Image Generation

arxiv preprint - VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

arxiv preprint - Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

arxiv preprint - High-Dimension Human Value Representation in Large Language Models

arxiv preprint - Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

arxiv preprint - Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

arxiv preprint - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

arxiv preprint - Evaluating Text-to-Visual Generation with Image-to-Text Generation

arxiv preprint - Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

arxiv preprint - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

arxiv preprint - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

arxiv preprint - WavLLM: Towards Robust and Adaptive Speech Large Language Model

arxiv preprint - Gecko: Versatile Text Embeddings Distilled from Large Language Models

arxiv preprint - ReALM: Reference Resolution As Language Modeling

arxiv preprint - sDPO: Don’t Use Your Data All at Once

arxiv preprint - LITA: Language Instructed Temporal-Localization Assistant

arxiv preprint - AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

arxiv preprint - InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

arxiv preprint - Giraffe: Adventures in Expanding Context Lengths in LLMs

arxiv preprint - Explorative Inbetweening of Time and Space

arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

arxiv preprint - Evaluating Large Language Models at Evaluating Instruction Following

arxiv preprint - Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

arxiv preprint - Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

arxiv preprint - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

arxiv preprint - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

arxiv preprint - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

arxiv preprint - Is Cosine-Similarity of Embeddings Really About Similarity?

arxiv preprint - A Generative Approach for Wikipedia-Scale Visual Entity Recognition

arxiv preprint - Self-correcting LLM-controlled Diffusion Models