The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722
10 Mar 2025
Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design. The complete show notes for this episode can be found at https://twimlai.com/go/722.
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
NPR News: 12-08-2025 2AM EST
08 Dec 2025
NPR News Now
NPR News: 12-07-2025 11PM EST
08 Dec 2025
NPR News Now
NPR News: 12-07-2025 10PM EST
08 Dec 2025
NPR News Now
Meidas Health: AAP President Strongly Pushes Back on Hepatitis B Vaccine Changes
08 Dec 2025
The MeidasTouch Podcast
Democrat Bobby Cole Discusses Race for Texas Governor
07 Dec 2025
The MeidasTouch Podcast
Fox News Crashes Out on Air Over Trump’s Rapid Fall
07 Dec 2025
The MeidasTouch Podcast