Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Podcast

ProRL: 延长强化学习拓展大语言模型推理边界

03 Jun 2025

Description

深入探讨ProRL(Prolonged Reinforcement Learning)如何通过延长强化学习训练,结合KL散度控制、参考策略重置和多样化任务,显著提升大语言模型的推理能力,甚至发掘出基础模型无法触及的全新解题策略。本期节目将详细解析ProRL的技术细节、Nemotron-Research-Reasoning-Qwen-1.5B模型的惊人表现,以及这对AI未来发展的深远影响。

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.