Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

THOR: Hierarchical RL for Mathematical Reasoning

19 Sep 2025

Description

This September 2025 paper describes THOR (Tool-Integrated Hierarchical Optimization via RL), a novel approach designed to enhance the mathematical reasoning and code generation capabilities of Large Language Models (LLMs) by integrating external code-execution tools. The methodology introduces TIRGen, a pipeline for creating high-quality Tool-Integrated Reasoning (TIR) data, which is crucial for training the model using a hierarchical reinforcement learning (RL) strategy. This RL framework incorporates both trajectory-level optimization for overall problem-solving ability and step-level optimization to correct code generation errors, addressing the sparse reward problem common in long reasoning tasks. Experimental results demonstrate that THOR achieves state-of-the-art (SOTA) performance across various mathematical and code benchmarks for both reasoning and non-reasoning models. Finally, the system leverages code execution feedback for a self-correction inference enhancement mechanism, which further improves performance, especially on more challenging problems.Source:https://arxiv.org/pdf/2509.13761

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.