arxiv preprint - NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING - AI Breakdown | Transcription & Insights

Audio

Description

In this episode, we discuss NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING by The authors of the paper "NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING" are: - Arsha Nagrani - Mingda Zhang - Ramin Mehran - Rachel Hornung - Nitesh Bharadwaj Gundavarapu - Nilpa Jha - Austin Myers - Xingyi Zhou - Boqing Gong - Cordelia Schmid - Mikhail Sirotenko - Yukun Zhu - Tobias Weyand. The paper introduces "Neptune," a semi-automatic system designed to generate complex question-answer-decoy sets from long video content to enhance comprehension tasks typically limited to short clips. Leveraging large models like Vision-Language Models and Large Language Models, Neptune creates detailed, time-aligned captions and intricate QA sets for videos up to 15 minutes long, aiming to improve annotation efficiency. The dataset emphasizes multimodal reasoning and introduces the GEM metric for evaluating responses, revealing current long video models' weaknesses in understanding temporal and state changes.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes

🗳️ Sign in to Upvote

Popular episodes get transcribed faster

AI Breakdown

arxiv preprint - NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING

This episode hasn't been transcribed yet

Other recent transcribed episodes

13:00H | 21 DIC 2025 | Fin de Semana

10:00H | 21 DIC 2025 | Fin de Semana

12:00H | 20 DIC 2025 | Fin de Semana

2ª PARTE | 06 ENE 2026 | EL PARTIDAZO DE COPE

3ª PARTE | 22 ENE 2026 | EL PARTIDAZO DE COPE

3ª PARTE | 04 MAR 2026 | EL PARTIDAZO DE COPE

Sign in to Audioscrape

Share this moment