Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AIandBlockchain

AI Under Attack: How Harbench Tests LLM Safety

17 Feb 2025

Description

How safe are large language models like ChatGPT and Google’s Gemini? In this episode, we dive into groundbreaking research on AI safety and explore Harbench, a powerful new tool designed to stress-test LLMs against harmful manipulation. With 18 different attack methods tested across 33 models, this study reveals surprising vulnerabilities—and promising solutions. We break down red teaming, contextual attacks, and the innovative R2-D2 defense system that could make AI more resilient. Can LLMs ever be truly safe? Join us as we tackle the risks, defenses, and ethical responsibilities shaping the future of AI.Link: https://arxiv.org/pdf/2402.04249

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.