Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Breakdown

arxiv preprint - NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING

07 Oct 2024

Description

In this episode, we discuss NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING by The authors of the paper "NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING" are: - Arsha Nagrani - Mingda Zhang - Ramin Mehran - Rachel Hornung - Nitesh Bharadwaj Gundavarapu - Nilpa Jha - Austin Myers - Xingyi Zhou - Boqing Gong - Cordelia Schmid - Mikhail Sirotenko - Yukun Zhu - Tobias Weyand. The paper introduces "Neptune," a semi-automatic system designed to generate complex question-answer-decoy sets from long video content to enhance comprehension tasks typically limited to short clips. Leveraging large models like Vision-Language Models and Large Language Models, Neptune creates detailed, time-aligned captions and intricate QA sets for videos up to 15 minutes long, aiming to improve annotation efficiency. The dataset emphasizes multimodal reasoning and introduces the GEM metric for evaluating responses, revealing current long video models' weaknesses in understanding temporal and state changes.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.