Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Breakdown

arxiv preprint - Training Chain-of-Thought via Latent-Variable Inference

22 Dec 2023

Description

In this episode we discuss Training Chain-of-Thought via Latent-Variable Inference by Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous. The paper introduces a fine-tuning strategy for large language models that improves their problem-solving accuracy by focusing on maximizing the probability of correct answers using chain-of-thought (CoT) prompts without requiring detailed rationale supervision. It tackles the challenge of sampling from the posterior distribution of possible rationales with a novel Markov-chain Monte Carlo (MCMC) expectation-maximization (EM) algorithm, which also incorporates a control-variate technique to reduce variance in gradient estimates. The method outperforms existing fine-tuning methods, including the self-taught reasoner (STaR) and prompt-tuning with CoT, in generating more accurate answers on various complex reasoning tasks.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.