Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Safety Newsletter

AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight.

01 Aug 2023

Description

Automatically Circumventing LLM GuardrailsLarge language models (LLMs) can generate hazardous information, such as step-by-step instructions on how to create a pandemic pathogen. To combat the risk of malicious use, companies typically build safety guardrails intended to prevent LLMs from misbehaving. But these safety controls are almost useless against a new attack developed by researchers at Carnegie Mellon University and the Center for AI Safety. By studying the vulnerabilities in open source models such as Meta’s LLaMA 2, the researchers can automatically generate a nearly unlimited supply of “adversarial suffixes,” which are words and characters that cause any model’s safety controls to fail. This discovery calls into question the fundamental limits of safety [...] ---Outline:(00:12) Automatically Circumventing LLM Guardrails(05:40) AI Labs Announce the Frontier Model Forum(07:54) Senate Hearing on AI Oversight(14:42) Links --- First published: August 1st, 2023 Source: https://newsletter.safe.ai/p/ai-safety-newsletter-17 --- Want more? Check out our ML Safety Newsletter for technical safety research. Narrated by TYPE III AUDIO.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.