Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Breakdown

arxiv preprint - Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

28 Dec 2023

Description

In this episode we discuss Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution by Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby. The paper introduces NaViT (Native Resolution Vision Transformer), which unlike traditional computer vision models does not require resizing images to a fixed resolution, instead handling arbitrary resolutions and aspect ratios through sequence packing. NaViT demonstrates better training efficiency and can be applied to various standard computer vision tasks, where it also achieves improved robustness and fairness results. This approach allows for flexible input handling at test time, optimizing performance-cost trade-offs, and represents a significant shift from conventional CNN-based computer vision pipelines.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.