Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Podcast

深入剖析DAPO:大规模开源LLM强化学习系统

02 Jun 2025

Description

本期播客深入探讨了DAPO(解耦裁剪与动态采样策略优化)算法,这是一个在Qwen2.5-32B基础模型上实现AIME 2024测试50分的先进大规模强化学习系统。我们详细讨论了其四项关键技术:Clip-Higher、动态采样、词元级策略梯度损失和超长奖励修正,以及它们如何解决熵塌陷、梯度消失、长CoT场景下的学习不平衡和奖励噪声等问题,并介绍了其开放源代码、训练代码和精心处理的数据集对社区的贡献。

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.