AI Breakdown

arxiv preprint - Tree Prompting: Efficient Task Adaptation without Fine-Tuning

02 Feb 2024

Contributed by Lukas

In this episode, we discuss Tree Prompting: Efficient Task Adaptation without Fine-Tuning by John X. Morris, Chandan Singh, Alexander M. Rush, Jianfen...

arxiv preprint - Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

01 Feb 2024

Contributed by Lukas

In this episode, we discuss Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens by Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Y...

arxiv preprint - Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

31 Jan 2024

Contributed by Lukas

In this episode, we discuss Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning by Fuxiao Liu, Kevin Lin, Linjie Li, Ji...

arxiv preprint - RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

30 Jan 2024

Contributed by Lukas

In this episode, we discuss RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture by Angels Balaguer, Vinamra Benara, Renato Luiz ...

arxiv preprint - SliceGPT: Compress Large Language Models by Deleting Rows and Columns

29 Jan 2024

Contributed by Lukas

In this episode, we discuss SliceGPT: Compress Large Language Models by Deleting Rows and Columns by Saleh Ashkboos, Maximilian L. Croci, Marcelo Genn...

arxiv preprint - Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

26 Jan 2024

Contributed by Lukas

In this episode, we discuss Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video by Shashanka Venkataramanan, Mamsha...

arxiv preprint - MambaByte: Token-free Selective State Space Model

25 Jan 2024

Contributed by Lukas

In this episode, we discuss MambaByte: Token-free Selective State Space Model by Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush...

arxiv preprint - Lumiere: A Space-Time Diffusion Model for Video Generation

24 Jan 2024

Contributed by Lukas

In this episode, we discuss Lumiere: A Space-Time Diffusion Model for Video Generation by Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni ...

arxiv preprint - Self-Rewarding Language Models

23 Jan 2024

Contributed by Lukas

In this episode, we discuss Self-Rewarding Language Models by Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Sainbayar Sukhbaatar, Jing Xu, Jason W...

arxiv preprint - Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

22 Jan 2024

Contributed by Lukas

In this episode, we discuss Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, J...

arxiv preprint - MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

19 Jan 2024

Contributed by Lukas

In this episode, we discuss MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding by Hongjie Zhang, Yi Liu, Lu Dong, Yi...

arxiv preprint - Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

18 Jan 2024

Contributed by Lukas

In this episode, we discuss Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model by Lianghui Zhu, Bencheng Liao...

arxiv preprint - Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

17 Jan 2024

Contributed by Lukas

In this episode, we discuss Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models by Asma Ghandeharioun, Avi Caci...

arxiv preprint - Time Travel in LLMs: Tracing Data Contamination in Large Language Models

16 Jan 2024

Contributed by Lukas

In this episode, we discuss Time Travel in LLMs: Tracing Data Contamination in Large Language Models by Shahriar Golchin, Mihai Surdeanu. The paper pr...

arxiv preprint - InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

12 Jan 2024

Contributed by Lukas

In this episode, we discuss InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes by Mohamad Shahbazi, Liesbeth Claessens, Michael Nieme...

arxiv preprint - A Simple LLM Framework for Long-Range Video Question-Answering

11 Jan 2024

Contributed by Lukas

In this episode, we discuss A Simple LLM Framework for Long-Range Video Question-Answering by Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Sh...

arxiv preprint - Mixtral of Experts

09 Jan 2024

Contributed by Lukas

In this episode, we discuss Mixtral of Experts by Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford,...

arxiv preprint - Weight subcloning: direct initialization of transformers using larger pretrained ones

08 Jan 2024

Contributed by Lukas

In this episode we discuss Weight subcloning: direct initialization of transformers using larger pretrained ones by Mohammad Samragh, Mehrdad Fara...

arxiv preprint - Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

05 Jan 2024

Contributed by Lukas

In this episode we discuss Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task by Maya Okawa, Ekdeep S...

arxiv preprint - LLM in a flash: Efficient Large Language Model Inference with Limited Memory

05 Jan 2024

Contributed by Lukas

In this episode, we discuss LLM in a flash: Efficient Large Language Model Inference with Limited Memory by Keivan Alizadeh, Iman Mirzadeh, Dmitry Bel...

arxiv preprint - The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

02 Jan 2024

Contributed by Lukas

In this episode, we discuss The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction by Pratyusha Sharma, Jor...

arxiv preprint - DreaMoving: A Human Video Generation Framework based on Diffusion Models

29 Dec 2023

Contributed by Lukas

In this episode we discuss DreaMoving: A Human Video Generation Framework based on Diffusion Models by Mengyang Feng, Jinlin Liu, Kai Yu, Yuan Yao...

arxiv preprint - Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

28 Dec 2023

Contributed by Lukas

In this episode we discuss Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution by Mostafa Dehghani, Basil Mustafa, Josi...

arxiv preprint - UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

28 Dec 2023

Contributed by Lukas

In this episode, we discuss UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces by Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehu...

arxiv preprint - LongNet: Scaling Transformers to 1,000,000,000 Tokens

27 Dec 2023

Contributed by Lukas

In this episode we discuss LongNet: Scaling Transformers to 1,000,000,000 Tokens by Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang...

arxiv preprint - MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

27 Dec 2023

Contributed by Lukas

In this episode, we discuss MotionCtrl: A Unified and Flexible Motion Controller for Video Generation by Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tians...

arxiv preprint - Model-tuning Via Prompts Makes NLP Models Adversarially Robust

26 Dec 2023

Contributed by Lukas

In this episode we discuss Model-tuning Via Prompts Makes NLP Models Adversarially Robust by Mrigank Raman, Pratyush Maini, J. Zico Kolter, Zachar...

arxiv preprint - Training Chain-of-Thought via Latent-Variable Inference

22 Dec 2023

Contributed by Lukas

In this episode we discuss Training Chain-of-Thought via Latent-Variable Inference by Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tu...

arxiv preprint - Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

21 Dec 2023

Contributed by Lukas

In this episode we discuss Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation by Bingxin Ke, Anton Obukhov, Shengyu Huang...

arxiv preprint - Instruction-tuning Aligns LLMs to the Human Brain

20 Dec 2023

Contributed by Lukas

In this episode we discuss Instruction-tuning Aligns LLMs to the Human Brain by Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimp...

arxiv preprint - WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

19 Dec 2023

Contributed by Lukas

In this episode we discuss WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia by Sina J. Sem...

arxiv preprint - DemoFusion: Democratising High-Resolution Image Generation With No $$$

18 Dec 2023

Contributed by Lukas

In this episode we discuss DemoFusion: Democratising High-Resolution Image Generation With No $$$ by Ruoyi Du, Dongliang Chang, Timothy Hospedales...

arxiv preprint - Recommender Systems with Generative Retrieval

15 Dec 2023

Contributed by Lukas

In this episode we discuss Recommender Systems with Generative Retrieval by Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, T...

arxiv preprint - Mamba: Linear-Time Sequence Modeling with Selective State Spaces

14 Dec 2023

Contributed by Lukas

In this episode we discuss Mamba: Linear-Time Sequence Modeling with Selective State Spaces by Albert Gu, Tri Dao. The paper presents Mamba, an in...

arxiv preprint - Block-State Transformers

13 Dec 2023

Contributed by Lukas

In this episode we discuss Block-State Transformers by Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshi...

arxiv preprint - Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

12 Dec 2023

Contributed by Lukas

In this episode we discuss Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns by Brian DuSell, David Chiang. Th...

arxiv preprint - LooseControl: Lifting ControlNet for Generalized Depth Conditioning

11 Dec 2023

Contributed by Lukas

In this episode we discuss LooseControl: Lifting ControlNet for Generalized Depth Conditioning by Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka....

Announcement: AI Breakdown Youtube Channel

08 Dec 2023

Contributed by Lukas

Welcome back to AI Breakdown! In this special announcement, your hosts Megan and Ray share exciting news - we're expanding to YouTube! This new platfo...

arxiv preprint - OneLLM: One Framework to Align All Modalities with Language

08 Dec 2023

Contributed by Lukas

In this episode we discuss OneLLM: One Framework to Align All Modalities with Language by Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Ka...

arxiv preprint - The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

08 Dec 2023

Contributed by Lukas

In this episode we discuss The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning by Bill Yuchen Lin, Abhilasha Ravichande...

arxiv - MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

07 Dec 2023

Contributed by Lukas

In this episode, we discuss MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI by Xiang Yue, Yuansheng N...

arxiv preprint - MLP-Mixer: An all-MLP Architecture for Vision

07 Dec 2023

Contributed by Lukas

In this episode we discuss MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiao...

arxiv preprint - Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

06 Dec 2023

Contributed by Lukas

In this episode we discuss Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine by Harsha Nori, Yin Tat Lee,...

arxiv preprint - Nash Learning from Human Feedback

05 Dec 2023

Contributed by Lukas

In this episode we discuss Nash Learning from Human Feedback by Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland...

arxiv preprint - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

04 Dec 2023

Contributed by Lukas

In this episode we discuss Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation by Li Hu, Xin Gao, Peng Zh...

arxiv preprint - Knowledge is a Region in Weight Space for Fine-tuned Language Models

03 Dec 2023

Contributed by Lukas

In this episode we discuss Knowledge is a Region in Weight Space for Fine-tuned Language Models by Almog Gueta, Elad Venezian, Colin Raffel, Noam ...

arxiv preprint - MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

02 Dec 2023

Contributed by Lukas

In this episode we discuss MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training by Pavan Kumar Anasosalu Vasu, Hadi Pouransa...

arxiv preprint - Simplifying Transformer Blocks

01 Dec 2023

Contributed by Lukas

In this episode we discuss Simplifying Transformer Blocks by Bobby He, Thomas Hofmann. The paper studies the possibility of simplifying standard t...

arxiv - Visual In-Context Prompting

30 Nov 2023

Contributed by Lukas

In this episode, we discuss Visual In-Context Prompting by Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang L...

Arxiv Preprint - GAIA: a benchmark for General AI Assistants

29 Nov 2023

Contributed by Lukas

In this episode we discuss GAIA: a benchmark for General AI Assistants by Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann L...

Arxiv Preprint - DisCo: Disentangled Control for Realistic Human Dance Generation

28 Nov 2023

Contributed by Lukas

In this episode we discuss DisCo: Disentangled Control for Realistic Human Dance Generation by Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung...

Arxiv Preprint - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

27 Nov 2023

Contributed by Lukas

In this episode we discuss Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation by Eric Zelikman, Eliana Lorch, Lester Mackey,...

Arxiv Preprint - A General Theoretical Paradigm to Understand Learning from Human Preferences

25 Nov 2023

Contributed by Lukas

In this episode we discuss A General Theoretical Paradigm to Understand Learning from Human Preferences by Mohammad Gheshlaghi Azar, Mark Rowland,...

Arxiv Preprint - ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

22 Nov 2023

Contributed by Lukas

In this episode we discuss ShareGPT4V: Improving Large Multi-Modal Models with Better Captions by Lin Chen, Jisong Li, Xiaoyi Dong, Pan Zhang, Con...

ArXiv Preprint - S-LoRA: Serving Thousands of Concurrent LoRA Adapters

21 Nov 2023

Contributed by Lukas

In this episode we discuss S-LoRA: Serving Thousands of Concurrent LoRA Adapters by Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Le...

ArXiv Preprint - Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

20 Nov 2023

Contributed by Lukas

In this episode we discuss Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities by AJ Piergiovanni, Isaac Noble...

Arxiv Preprint - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

17 Nov 2023

Contributed by Lukas

In this episode we discuss LCM-LoRA: A Universal Stable-Diffusion Acceleration Module by Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Pl...

ArXiv Preprint - Fine-tuning Language Models for Factuality

16 Nov 2023

Contributed by Lukas

In this episode we discuss Fine-tuning Language Models for Factuality by Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, Chelse...

arxiv preprint - Language Models can be Logical Solvers

15 Nov 2023

Contributed by Lukas

In this episode we discuss Language Models can be Logical Solvers by Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zha...

ArXiv Preprint - Prompt Engineering a Prompt Engineer

14 Nov 2023

Contributed by Lukas

In this episode we discuss Prompt Engineering a Prompt Engineer by Qinyuan Ye, Maxamed Axmed, Reid Pryzant, Fereshte Khani. The paper presents PE2...

arxiv preprint - CogVLM: Visual Expert for Pretrained Language Models

13 Nov 2023

Contributed by Lukas

In this episode we discuss CogVLM: Visual Expert for Pretrained Language Models by Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wa...

ArXiv Preprint - De-Diffusion Makes Text a Strong Cross-Modal Interface

10 Nov 2023

Contributed by Lukas

In this episode we discuss De-Diffusion Makes Text a Strong Cross-Modal Interface by Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuill...

ArXiv Preprint - E3 TTS: Easy End-to-End Diffusion-based Text to Speech

09 Nov 2023

Contributed by Lukas

In this episode we discuss E3 TTS: Easy End-to-End Diffusion-based Text to Speech by Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen. The paper ...

ArXiv Preprint - Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

08 Nov 2023

Contributed by Lukas

In this episode we discuss Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges by Chenhang Cui, Yiyang Zhou, Xin...

ArXiv Preprint - Learning From Mistakes Makes LLM Better Reasoner

07 Nov 2023

Contributed by Lukas

In this episode we discuss Learning From Mistakes Makes LLM Better Reasoner by Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, W...

ArXiv Preprint - The Generative AI Paradox: ”What It Can Create, It May Not Understand”

06 Nov 2023

Contributed by Lukas

In this episode we discuss The Generative AI Paradox: "What It Can Create, It May Not Understand" by Peter West, Ximing Lu, Nouha Dziri, Faeze Bra...

ArXiv Preprint - TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

03 Nov 2023

Contributed by Lukas

In this episode we discuss TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise by Nan He, Hanyu Lai, Chenyang Zhao...

ArXiv Preprint - MM-VID: Advancing Video Understanding with GPT-4V(ision)

02 Nov 2023

Contributed by Lukas

In this episode we discuss MM-VID: Advancing Video Understanding with GPT-4V(ision) by Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan ...

ArXiv Preprint - Zephyr: Direct Distillation of LM Alignment

01 Nov 2023

Contributed by Lukas

In this episode we discuss Zephyr: Direct Distillation of LM Alignment by Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif ...

ArXiv Preprint - ControlLLM: Augment Language Models with Tools by Searching on Graphs

31 Oct 2023

Contributed by Lukas

In this episode we discuss ControlLLM: Augment Language Models with Tools by Searching on Graphs by Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei...

ArXiv Preprint - Talk like a Graph: Encoding Graphs for Large Language Models

30 Oct 2023

Contributed by Lukas

In this episode we discuss Talk like a Graph: Encoding Graphs for Large Language Models by Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi. The pap...

arxiv Preprint - AgentTuning: Enabling Generalized Agent Abilities for LLMs

29 Oct 2023

Contributed by Lukas

In this episode we discuss AgentTuning: Enabling Generalized Agent Abilities for LLMs by Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yu...

ArXiv Preprint - Jailbreaking Black Box Large Language Models in Twenty Queries

28 Oct 2023

Contributed by Lukas

In this episode we discuss Jailbreaking Black Box Large Language Models in Twenty Queries by Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed ...

ArXiv Preprint - Matryoshka Diffusion Models

27 Oct 2023

Contributed by Lukas

In this episode we discuss Matryoshka Diffusion Models by Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly. The paper introdu...

arxiv Preprint - An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning

26 Oct 2023

Contributed by Lukas

In this episode we discuss An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning by Chen Jin, Ryuta...

arxiv Preprint - Retrieval meets Long Context Large Language Models

25 Oct 2023

Contributed by Lukas

In this episode we discuss Retrieval meets Long Context Large Language Models by Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan ...

arxiv Preprint - Contrastive Prefence Learning: Learning from Human Feedback without RL

24 Oct 2023

Contributed by Lukas

In this episode we discuss Contrastive Prefence Learning: Learning from Human Feedback without RL by Joey Hejna, Rafael Rafailov, Harshit Sikchi, ...

arxiv Preprint - BitNet: Scaling 1-bit Transformers for Large Language Models

23 Oct 2023

Contributed by Lukas

In this episode we discuss BitNet: Scaling 1-bit Transformers for Large Language Models by Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaiji...

arxiv Preprint - Automatic Prompt Optimization with ”Gradient Descent” and Beam Search

22 Oct 2023

Contributed by Lukas

In this episode we discuss Automatic Prompt Optimization with "Gradient Descent" and Beam Search by Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee,...

arxiv Preprint - Understanding Retrieval Augmentation for Long-Form Question Answering

21 Oct 2023

Contributed by Lukas

In this episode we discuss Understanding Retrieval Augmentation for Long-Form Question Answering by Hung-Ting Chen, Fangyuan Xu, Shane A. Arora, E...

arxiv Preprint - On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

20 Oct 2023

Contributed by Lukas

In this episode we discuss On the Connection between Pre-training Data Diversity and Fine-tuning Robustness by Vivek Ramanujan, Thao Nguyen, Sewoo...

arxiv Preprint - Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness

19 Oct 2023

Contributed by Lukas

In this episode we discuss Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness by Felix Friedrich, Manuel Brack, Lukas Struppe...

arxiv Preprint - In-Context Pretraining: Language Modeling Beyond Document Boundaries

18 Oct 2023

Contributed by Lukas

In this episode we discuss In-Context Pretraining: Language Modeling Beyond Document Boundaries by Weijia Shi, Sewon Min, Maria Lomeli, Chunting Z...

ICCV 2023 - Sigmoid Loss for Language Image Pre-Training

17 Oct 2023

Contributed by Lukas

In this episode we discuss Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. The pap...

arxiv Preprint - Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

16 Oct 2023

Contributed by Lukas

In this episode we discuss Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading by Howard Chen, Ramakanth Pasunuru, Jaso...

arxiv Preprint - HyperAttention: Long-context Attention in Near-Linear Time

15 Oct 2023

Contributed by Lukas

In this episode we discuss HyperAttention: Long-context Attention in Near-Linear Time by Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, D...

arxiv Preprint - InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists

13 Oct 2023

Contributed by Lukas

In this episode we discuss InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists by Yulu Gan, Sungwoo Park, Alexander...

arxiv Preprint - Large Language Models Cannot Self-Correct Reasoning Yet

12 Oct 2023

Contributed by Lukas

In this episode we discuss Large Language Models Cannot Self-Correct Reasoning Yet by Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng...

arxiv Preprint - Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

11 Oct 2023

Contributed by Lukas

In this episode we discuss Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution by Chrisantha Fernando, Dylan Banarse, Henryk Mic...

arxiv Preprint - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

10 Oct 2023

Contributed by Lukas

In this episode we discuss Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation by Eric Zelikman, Eliana Lorch, Lester Mackey,...

arxiv Preprint - Improved Baselines with Visual Instruction Tuning

09 Oct 2023

Contributed by Lukas

In this episode we discuss Improved Baselines with Visual Instruction Tuning by Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee. The authors pro...

arxiv Preprint - Tree of Thoughts: Deliberate Problem Solving with Large Language Models

08 Oct 2023

Contributed by Lukas

In this episode we discuss Tree of Thoughts: Deliberate Problem Solving with Large Language Models by Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Sha...

Neurips 2023 - Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

07 Oct 2023

Contributed by Lukas

In this episode we discuss Evaluating Cognitive Maps and Planning in Large Language Models with CogEval by Ida Momennejad, Hosein Hasanbeig, Felip...

ICCV 2023 - Diffusion Models as Masked Autoencoders

06 Oct 2023

Contributed by Lukas

In this episode we discuss Diffusion Models as Masked Autoencoders by Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, H...

arxiv Preprint - Conditional Diffusion Distillation

05 Oct 2023

Contributed by Lukas

In this episode we discuss Conditional Diffusion Distillation by Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, P...

arxiv Preprint - Enable Language Models to Implicitly Learn Self-Improvement From Data

04 Oct 2023

Contributed by Lukas

In this episode we discuss Enable Language Models to Implicitly Learn Self-Improvement From Data by Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yun...

arxiv Preprint - Efficient Streaming Language Models with Attention Sinks

03 Oct 2023

Contributed by Lukas

In this episode we discuss Efficient Streaming Language Models with Attention Sinks by Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike L...

Neurips 2023 - PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving

02 Oct 2023

Contributed by Lukas

In this episode we discuss PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving by Sepidehsadat Hosseini, Mohammad Am...

arxiv Preprint - Vision Transformers Need Registers

01 Oct 2023

Contributed by Lukas

In this episode we discuss Vision Transformers Need Registers by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski. The paper discus...

arxiv Preprint - VPA: Fully Test-Time Visual Prompt Adaptation

30 Sep 2023

Contributed by Lukas

In this episode we discuss VPA: Fully Test-Time Visual Prompt Adaptation by Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, ...

Activity Overview

Episodes

arxiv preprint - Tree Prompting: Efficient Task Adaptation without Fine-Tuning

arxiv preprint - Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

arxiv preprint - Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

arxiv preprint - RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

arxiv preprint - SliceGPT: Compress Large Language Models by Deleting Rows and Columns

arxiv preprint - Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

arxiv preprint - MambaByte: Token-free Selective State Space Model

arxiv preprint - Lumiere: A Space-Time Diffusion Model for Video Generation

arxiv preprint - Self-Rewarding Language Models

arxiv preprint - Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

arxiv preprint - MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

arxiv preprint - Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

arxiv preprint - Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

arxiv preprint - Time Travel in LLMs: Tracing Data Contamination in Large Language Models

arxiv preprint - InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

arxiv preprint - A Simple LLM Framework for Long-Range Video Question-Answering

arxiv preprint - Mixtral of Experts

arxiv preprint - Weight subcloning: direct initialization of transformers using larger pretrained ones

arxiv preprint - Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

arxiv preprint - LLM in a flash: Efficient Large Language Model Inference with Limited Memory

arxiv preprint - The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

arxiv preprint - DreaMoving: A Human Video Generation Framework based on Diffusion Models

arxiv preprint - Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

arxiv preprint - UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

arxiv preprint - LongNet: Scaling Transformers to 1,000,000,000 Tokens

arxiv preprint - MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

arxiv preprint - Model-tuning Via Prompts Makes NLP Models Adversarially Robust

arxiv preprint - Training Chain-of-Thought via Latent-Variable Inference

arxiv preprint - Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

arxiv preprint - Instruction-tuning Aligns LLMs to the Human Brain

arxiv preprint - WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

arxiv preprint - DemoFusion: Democratising High-Resolution Image Generation With No $$$

arxiv preprint - Recommender Systems with Generative Retrieval

arxiv preprint - Mamba: Linear-Time Sequence Modeling with Selective State Spaces

arxiv preprint - Block-State Transformers

arxiv preprint - Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

arxiv preprint - LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Announcement: AI Breakdown Youtube Channel

arxiv preprint - OneLLM: One Framework to Align All Modalities with Language

arxiv preprint - The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

arxiv - MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

arxiv preprint - MLP-Mixer: An all-MLP Architecture for Vision

arxiv preprint - Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

arxiv preprint - Nash Learning from Human Feedback

arxiv preprint - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

arxiv preprint - Knowledge is a Region in Weight Space for Fine-tuned Language Models

arxiv preprint - MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

arxiv preprint - Simplifying Transformer Blocks

arxiv - Visual In-Context Prompting

Arxiv Preprint - GAIA: a benchmark for General AI Assistants

Arxiv Preprint - DisCo: Disentangled Control for Realistic Human Dance Generation

Arxiv Preprint - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

Arxiv Preprint - A General Theoretical Paradigm to Understand Learning from Human Preferences

Arxiv Preprint - ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

ArXiv Preprint - S-LoRA: Serving Thousands of Concurrent LoRA Adapters

ArXiv Preprint - Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Arxiv Preprint - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

ArXiv Preprint - Fine-tuning Language Models for Factuality

arxiv preprint - Language Models can be Logical Solvers

ArXiv Preprint - Prompt Engineering a Prompt Engineer

arxiv preprint - CogVLM: Visual Expert for Pretrained Language Models

ArXiv Preprint - De-Diffusion Makes Text a Strong Cross-Modal Interface

ArXiv Preprint - E3 TTS: Easy End-to-End Diffusion-based Text to Speech

ArXiv Preprint - Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

ArXiv Preprint - Learning From Mistakes Makes LLM Better Reasoner

ArXiv Preprint - The Generative AI Paradox: ”What It Can Create, It May Not Understand”

ArXiv Preprint - TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

ArXiv Preprint - MM-VID: Advancing Video Understanding with GPT-4V(ision)

ArXiv Preprint - Zephyr: Direct Distillation of LM Alignment

ArXiv Preprint - ControlLLM: Augment Language Models with Tools by Searching on Graphs

ArXiv Preprint - Talk like a Graph: Encoding Graphs for Large Language Models

arxiv Preprint - AgentTuning: Enabling Generalized Agent Abilities for LLMs

ArXiv Preprint - Jailbreaking Black Box Large Language Models in Twenty Queries

ArXiv Preprint - Matryoshka Diffusion Models

arxiv Preprint - An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning

arxiv Preprint - Retrieval meets Long Context Large Language Models

arxiv Preprint - Contrastive Prefence Learning: Learning from Human Feedback without RL