AI Post Transformers

THOR: Hierarchical RL for Mathematical Reasoning

19 Sep 2025

Audio

Description

This September 2025 paper describes THOR (Tool-Integrated Hierarchical Optimization via RL), a novel approach designed to enhance the mathematical reasoning and code generation capabilities of Large Language Models (LLMs) by integrating external code-execution tools. The methodology introduces TIRGen, a pipeline for creating high-quality Tool-Integrated Reasoning (TIR) data, which is crucial for training the model using a hierarchical reinforcement learning (RL) strategy. This RL framework incorporates both trajectory-level optimization for overall problem-solving ability and step-level optimization to correct code generation errors, addressing the sparse reward problem common in long reasoning tasks. Experimental results demonstrate that THOR achieves state-of-the-art (SOTA) performance across various mathematical and code benchmarks for both reasoning and non-reasoning models. Finally, the system leverages code execution feedback for a self-correction inference enhancement mechanism, which further improves performance, especially on more challenging problems.Source:https://arxiv.org/pdf/2509.13761

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes

🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Other recent transcribed episodes

Transcribed and ready to explore now

13:00H | 21 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

10:00H | 21 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

12:00H | 20 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

2ª PARTE | 06 ENE 2026 | EL PARTIDAZO DE COPE

01 Jan 1970

El Partidazo de COPE

3ª PARTE | 22 ENE 2026 | EL PARTIDAZO DE COPE

01 Jan 1970

El Partidazo de COPE

3ª PARTE | 04 MAR 2026 | EL PARTIDAZO DE COPE

01 Jan 1970

El Partidazo de COPE

Comments

There are no comments yet.

Please log in to write the first comment.

Report any issue

AI Post Transformers

THOR: Hierarchical RL for Mathematical Reasoning

This episode hasn't been transcribed yet

Other recent transcribed episodes

13:00H | 21 DIC 2025 | Fin de Semana

10:00H | 21 DIC 2025 | Fin de Semana

12:00H | 20 DIC 2025 | Fin de Semana

2ª PARTE | 06 ENE 2026 | EL PARTIDAZO DE COPE

3ª PARTE | 22 ENE 2026 | EL PARTIDAZO DE COPE

3ª PARTE | 04 MAR 2026 | EL PARTIDAZO DE COPE

Sign in to Audioscrape

Share this moment