Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Breakdown

arxiv preprint - Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

26 Jan 2024

Description

In this episode, we discuss Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video by Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis. The paper presents two innovations in self-supervised learning: a new dataset called "Walking Tours," which features high-resolution, long duration, first-person videos ideal for self-supervision, and a novel pretraining method called DORA which uses transformer cross-attention to track and learn object recognition in videos. This method diverges from adapting image-based pretraining to videos by instead focusing on tracking objects over time. The researchers found that their approach, combining the Walking Tours dataset with DORA, performed comparably to ImageNet on various image and video recognition tasks, showcasing the efficiency of their method.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.