arxiv preprint - A Simple LLM Framework for Long-Range Video Question-Answering - AI Breakdown | Transcription & Insights

Audio

Description

In this episode, we discuss A Simple LLM Framework for Long-Range Video Question-Answering by Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius. The LLoVi framework innovates in long-range video question-answering (LVQA) by combining visual captioners with Large Language Models (LLMs) such as GPT-3.5 or GPT-4, foregoing complex long-range video modeling structures. Short video clips from a long video are captioned and these captions are then synthesized by an LLM to answer questions over the entire video length, proving more effective at LVQA than previous methods. In benchmarks, LLoVi notably outperformed previous best-performing approaches on several datasets, such as EgoSchema, NeXT-QA, IntentQA, and NeXT-GQA, and the code for LLoVi will be made publicly available.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes

🗳️ Sign in to Upvote

Popular episodes get transcribed faster

AI Breakdown

arxiv preprint - A Simple LLM Framework for Long-Range Video Question-Answering

This episode hasn't been transcribed yet

Other recent transcribed episodes

13:00H | 21 DIC 2025 | Fin de Semana

10:00H | 21 DIC 2025 | Fin de Semana

12:00H | 20 DIC 2025 | Fin de Semana

2ª PARTE | 06 ENE 2026 | EL PARTIDAZO DE COPE

3ª PARTE | 22 ENE 2026 | EL PARTIDAZO DE COPE

3ª PARTE | 04 MAR 2026 | EL PARTIDAZO DE COPE

Sign in to Audioscrape

Share this moment