Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Breakdown

Arxiv paper - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling

27 Feb 2025

Description

In this episode, we discuss PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling by Avery Ma, Yangchen Pan, Amir-massoud Farahmand. The paper introduces PANDAS, a hybrid technique that enhances many-shot jailbreaking by altering fabricated dialogues with positive affirmations, negative demonstrations, and optimized adaptive sampling tailored to specific prompts. Experimental results on AdvBench and HarmBench using advanced large language models show that PANDAS significantly outperforms existing baseline methods in scenarios involving long input contexts. Additionally, an attention analysis highlights how PANDAS exploits long-context vulnerabilities, providing deeper insights into the mechanics of many-shot jailbreaking.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.