AI Podcast
Episodes
DeepSeek-R1:通过强化学习激励大型语言模型的推理能力
24 Jan 2025
Contributed by Lukas
本播客深入探讨DeepSeek-R1模型,该模型通过大规模强化学习显著提升了大型语言模型的推理能力。我们将分析DeepSee...
DistServe:面向高吞吐量的大型语言模型服务的分离式预填充和解码
23 Jan 2025
Contributed by Lukas
本播客讨论了DistServe,一种通过分离预填充和解码计算来提高大型语言模型(LLM)服务性能的系统。我们深入探讨...
大规模Transformer模型推理的效率优化
20 Jan 2025
Contributed by Lukas
本播客深入探讨了如何高效地部署大型Transformer模型进行生成式推理,特别是在延迟敏感和长序列长度的场景下。我...
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
19 Jan 2025
Contributed by Lukas
A podcast discussing the research paper 'Addressing Representation Collapse in Vector Quantized Models with One Linear Layer'
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
19 Jan 2025
Contributed by Lukas
A podcast discussing a new approach to audio-driven portrait animation called Sonic, focusing on global audio perception rather than visual cues. The ...
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
18 Jan 2025
Contributed by Lukas
A podcast discussing Titans: Learning to Memorize at Test Time.
AI Radio FM - Technology Channel
18 Jan 2025
Contributed by Lukas
A podcast discussing Tensor Product Attention, a novel attention mechanism for Large Language Models.
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
16 Jan 2025
Contributed by Lukas
A podcast discussing Classifier-Free Diffusion Guidance, a method for improving sample quality in diffusion models without relying on a separate class...
MinMo:多模态大型语言模型,实现无缝语音交互
15 Jan 2025
Contributed by Lukas
本播客深入探讨了阿里巴巴 Tongyi Lab 的 MinMo 模型,这是一种旨在实现无缝语音交互的多模态大型语言模型。我们讨...
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
15 Jan 2025
Contributed by Lukas
A podcast discussing the paper Titans: Learning to Memorize at Test Time.
iSTFTNet: 快速轻量级梅尔频谱声码器
09 Jan 2025
Contributed by Lukas
探讨iSTFTNet如何通过逆短时傅里叶变换优化梅尔频谱声码器,提高速度和效率。
AI技术前沿:Phi-4 大型语言模型的突破
09 Jan 2025
Contributed by Lukas
深入探讨微软最新发布的Phi-4大型语言模型,了解其在数据质量、合成数据、训练方法和后训练优化方面的创新。 ...
StyleTTS 2: Towards Human-Level Text-to-Speech
09 Jan 2025
Contributed by Lukas
A podcast discussion about the StyleTTS 2 model for text-to-speech synthesis, focusing on its innovative use of style diffusion and adversarial traini...
WavChat:语音对话模型调查
09 Jan 2025
Contributed by Lukas
本播客深入探讨了语音对话模型的最新进展,包括其功能、表示形式、训练范式以及流媒体和交互能力。
宇宙世界基础模型平台:物理人工智能的未来
07 Jan 2025
Contributed by Lukas
深入探讨NVIDIA Cosmos世界基础模型平台,该平台旨在促进物理人工智能的发展,通过数字孪生和世界模型,加速人工...
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
07 Jan 2025
Contributed by Lukas
A podcast discussing the Story-Adapter framework for long story visualization.
AI科技前沿:故事扩散模型深度解析
07 Jan 2025
Contributed by Lukas
本期播客深入探讨故事扩散模型,一种用于生成连贯图像和视频的新方法。我们将详细分析其核心技术,包括一致...
智能格林:基于潜在扩散模型的开放式视觉故事讲述
07 Jan 2025
Contributed by Lukas
本期播客讨论了一篇关于使用潜在扩散模型进行开放式视觉故事讲述的论文。我们深入探讨了该模型的技术细节、...
AI Radio FM - Technology Channel
07 Jan 2025
Contributed by Lukas
A podcast discussing the IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
07 Jan 2025
Contributed by Lukas
A podcast discussion about PowerInfer-2, a framework for running large language models on smartphones, focusing on its neuron cluster design, adaptive...
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
06 Jan 2025
Contributed by Lukas
A deep dive into LatentSync, an innovative lip-sync framework using audio-conditioned latent diffusion models, its methodology, experiments, and the r...
Infinity: Scaling Bitwise Autoregressive Modeling for High-Resolution Image Synthesis
06 Jan 2025
Contributed by Lukas
A podcast discussing the groundbreaking research paper on Infinity, a novel autoregressive model for high-resolution image synthesis.
AI驱动的交互式头部生成
06 Jan 2025
Contributed by Lukas
本期播客深入探讨了INFP,一个用于双人对话的音频驱动的头部生成框架。我们将探讨其创新方法、数据集以及实验...
CosyVoice 2: 使用大型语言模型实现可扩展的流式语音合成
06 Jan 2025
Contributed by Lukas
一个关于 CosyVoice 2 的播客,这是一个改进的流式语音合成模型,它利用大型语言模型,实现了接近人类水平的自然...
Flow Matching for Generative Modeling
06 Jan 2025
Contributed by Lukas
A podcast discussing the new paradigm for generative modeling using Continuous Normalizing Flows (CNFs) called Flow Matching (FM). FM offers a simulat...
Swin Transformer: A New Vision Transformer
05 Jan 2025
Contributed by Lukas
A podcast discussing the Swin Transformer, a hierarchical vision transformer using shifted windows for computer vision tasks.
ConvNeXt: A Modern ConvNet for the 2020s
05 Jan 2025
Contributed by Lukas
A podcast discussing the architecture and performance of ConvNeXt, a modern ConvNet model that challenges the dominance of Vision Transformers.
AI Vision Podcast: Masked Autoencoders for Scalable Vision Learning
05 Jan 2025
Contributed by Lukas
A deep dive into Masked Autoencoders (MAE) and their impact on computer vision, discussing their architecture, training efficiency, and performance on...
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
04 Jan 2025
Contributed by Lukas
A podcast discussing the auxiliary-loss-free load balancing strategy for mixture-of-experts models.
混合专家模型(MoE)技术综述
04 Jan 2025
Contributed by Lukas
本播客深入探讨了混合专家模型(MoE)的最新进展、算法设计、系统实现以及实际应用。从稀疏和密集MoE的背景知...
零气泡流水线并行
04 Jan 2025
Contributed by Lukas
本期播客深入探讨了零气泡流水线并行技术,这是一种旨在提高大规模分布式训练效率的创新方法。我们分析了传...
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
04 Jan 2025
Contributed by Lukas
A podcast discussion about GShard, a module for scaling neural networks using conditional computation and automatic sharding, focusing on its applicat...
AI Radio FM - Technology Channel: GShard and Giant Models
04 Jan 2025
Contributed by Lukas
A deep dive into GShard, a module for scaling giant neural networks, focusing on its application to multilingual machine translation and its impact on...
混合张量专家数据并行方法优化混合专家训练
04 Jan 2025
Contributed by Lukas
深入探讨 DeepSpeed-TED,一种新颖的三维混合并行框架,用于训练具有大型基础模型的混合专家模型。我们讨论了内存...
统一序列并行方法:为长上下文生成式AI赋能
04 Jan 2025
Contributed by Lukas
本播客深入探讨了统一序列并行(Unified Sequence Parallelism,简称USP)方法,这是一种用于训练具有极长上下文的生成...
LoongTrain: 高效长序列大语言模型训练
04 Jan 2025
Contributed by Lukas
本期播客深入探讨LoongTrain,一个为长序列大语言模型设计的高效训练框架。我们将讨论其核心的2D注意力机制,以...
Ring Attention with Blockwise Transformers for Near-Infinite Context
04 Jan 2025
Contributed by Lukas
A podcast discussing a novel approach to scale transformer models to handle near-infinite context lengths.
FlashAttention-3: Revolutionizing Attention Mechanisms on GPUs
04 Jan 2025
Contributed by Lukas
A podcast discussing the FlashAttention-3 algorithm, its improvements over previous versions, and its impact on large language models.
AI FlashAttention-2 Podcast
04 Jan 2025
Contributed by Lukas
A fast-paced discussion on FlashAttention-2, a faster attention mechanism for Transformers, exploring its algorithms, parallelism, and performance ben...
FlashAttention: 高效且内存优化的精确注意力机制
04 Jan 2025
Contributed by Lukas
探讨 FlashAttention 算法,一种在 GPU 上实现快速、内存高效精确注意力机制的新方法。深入分析其 IO 复杂度,并与现...
DeepSpeed Ulysses: 极端长序列Transformer模型训练的系统优化
04 Jan 2025
Contributed by Lukas
本播客深入探讨了DeepSpeed Ulysses,一种用于训练具有极长序列长度的Transformer模型的创新方法,它通过优化序列并行...
DistFlashAttn: 分布式长文本大语言模型训练的内存高效注意力机制
04 Jan 2025
Contributed by Lukas
本播客深入探讨 DistFlashAttn,一种专为长文本大语言模型训练设计的分布式内存高效注意力机制,详细解析其核心技...
大型Transformer模型中减少激活重计算
04 Jan 2025
Contributed by Lukas
本播客讨论了一种加速大型Transformer模型训练的新方法,通过减少激活重计算来实现。我们将深入探讨序列并行和选...
序列并行:从系统角度进行长序列训练
04 Jan 2025
Contributed by Lukas
探讨一种名为“序列并行”的内存高效并行方法,该方法旨在突破输入序列长度的限制,并能在GPU上高效训练更长...
AI驱动的大规模语言模型训练:Megatron-LM在GPU集群上的高效实践
04 Jan 2025
Contributed by Lukas
本期播客深入探讨了如何使用Megatron-LM在GPU集群上高效训练大规模语言模型,重点关注张量并行、流水线并行和数据...
AI Radio FM - Technology Channel: PagedAttention for Large Language Model Serving
04 Jan 2025
Contributed by Lukas
A podcast discussing PagedAttention, a novel memory management technique for serving large language models, and its implementation in vLLM.
ORCA: 分布式Transformer生成模型服务系统
04 Jan 2025
Contributed by Lukas
本期播客深入探讨了ORCA,一个为Transformer模型设计的分布式服务系统。我们将详细介绍其创新的迭代级调度和选择...
LLM推理优化:连续批处理实现23倍吞吐量提升
04 Jan 2025
Contributed by Lukas
本期播客深入探讨了大型语言模型(LLM)推理中的连续批处理技术,揭示了其如何显著提高吞吐量并降低延迟。我...
Mooncake:一种以KVCache为中心的LLM服务解耦架构
04 Jan 2025
Contributed by Lukas
本播客深入探讨Mooncake的创新架构,这是一种专为高效服务大型语言模型而设计的解耦系统。
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
02 Jan 2025
Contributed by Lukas
A podcast discussing the InternLM-XComposer2 model, its architecture, and capabilities in free-form text-image composition and comprehension.
AI Radio FM - Technology Channel, Your Personal Generative AI Podcast
02 Jan 2025
Contributed by Lukas
A fast-paced, enthusiastic podcast discussing the latest advancements in AI, focusing on the InternLM-XComposer2-4KHD model.
AI Radio FM - Technology Channel
02 Jan 2025
Contributed by Lukas
A podcast discussing InternLM-XComposer-2.5, a versatile large vision language model.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
02 Jan 2025
Contributed by Lukas
A podcast discussing the InternLM-XComposer2.5-OmniLive system, a novel multimodal AI for long-term video and audio interaction.
DeepSeekMoE: 超越专家混合模型的终极专业化
28 Dec 2024
Contributed by Lukas
本期播客深入探讨了DeepSeekMoE这一创新的混合专家模型架构,旨在实现专家知识的终极专业化。我们将讨论其核心策...
DeepSeek-V3: A Deep Dive into a Powerful Mixture-of-Experts Model
27 Dec 2024
Contributed by Lukas
A podcast discussion analyzing the DeepSeek-V3 technical report, covering its architecture, training, and performance.
E2 TTS: 令人惊讶的简单零样本文本到语音技术
27 Dec 2024
Contributed by Lukas
本期节目深入探讨了E2 TTS,一种完全非自回归的零样本文本到语音系统,它在自然度、说话人相似度和可懂度方面...
BigVGAN: 通用神经声码器大规模训练
27 Dec 2024
Contributed by Lukas
本播客讨论了BigVGAN,一种通用的神经声码器,它通过大规模训练实现高保真音频合成,并在各种分布外场景中表现...
F5-TTS: 突破性文本到语音技术
27 Dec 2024
Contributed by Lukas
深入探讨 F5-TTS,一种基于流匹配的非自回归文本到语音系统,该系统在零样本语音合成方面表现出色。
深入浅出:注意力机制的演变与应用
25 Dec 2024
Contributed by Lukas
本期播客将深入探讨注意力机制在深度学习领域的演变与应用,从Seq2Seq模型的局限性到Transformer的创新,再到Self-A...
Speech and Language Processing
24 Dec 2024
Contributed by Lukas
A podcast discussing the content from Daniel Jurafsky and James H. Martin's "Speech and Language Processing" textbook, Third Edition dr...
从慢速双向到快速因果视频生成器
23 Dec 2024
Contributed by Lukas
本播客讨论了一种新的视频生成方法,该方法通过将预训练的双向扩散模型转化为因果模型,并结合分布匹配蒸馏...
PaliGemma 2: A Versatile Vision-Language Model
21 Dec 2024
Contributed by Lukas
A podcast discussion about PaliGemma 2, a family of versatile vision-language models, and its capabilities in various tasks.
Byte Latent Transformer: Patches Scale Better Than Tokens
21 Dec 2024
Contributed by Lukas
A podcast discussing the Byte Latent Transformer (BLT), a novel byte-level LLM architecture that matches tokenization-based LLM performance with impro...
AI前沿:POINTS1.5视觉语言模型深度解析
21 Dec 2024
Contributed by Lukas
本期播客深入探讨了腾讯微信AI团队推出的最新视觉语言模型POINTS1.5,从模型架构、双语支持到训练策略,全面解析...
ModernBERT: A Deep Dive into Efficient Encoder Models
21 Dec 2024
Contributed by Lukas
An in-depth discussion of the ModernBERT paper, exploring its architecture, training methodology, and performance across various NLP tasks.
Open-Sora Plan: 开源大型视频生成模型
15 Dec 2024
Contributed by Lukas
本播客深入探讨了Open-Sora Plan,一个旨在生成高质量、长时视频的开源项目。我们将详细分析其核心模型、辅助策略...
Wavelet Flow VAE for Latent Video Diffusion Models
15 Dec 2024
Contributed by Lukas
A podcast discussion about Wavelet Flow VAE (WF-VAE), a novel autoencoder that leverages multi-level wavelet transforms to enhance video encoding effi...
AI 广播 FM - 技术频道,您的个人生成式人工智能播客
14 Dec 2024
Contributed by Lukas
本期节目我们深入探讨了 POINTS1.5,一个在真实世界应用中表现出色的视觉语言模型。
Gemini 2.0 Flash for Developers
12 Dec 2024
Contributed by Lukas
A podcast discussing the new Gemini 2.0 Flash model and its capabilities for developers.
Gemini 2.0 新纪元:智能体时代的到来
12 Dec 2024
Contributed by Lukas
谷歌推出了 Gemini 2.0,一款为智能体时代打造的全新 AI 模型。本次播客将深入探讨 Gemini 2.0 的特性、应用以及谷歌...
AI电台FM科技频道:生成对抗网络GANs深度解析
12 Dec 2024
Contributed by Lukas
本期节目深入探讨Ian Goodfellow等人在2014年提出的生成对抗网络(GANs),揭秘其原理、优势及应用,带你领略AI领域...
AI电台FM科技频道:多模态大型语言模型评估的全面综述
12 Dec 2024
Contributed by Lukas
本期节目深入探讨多模态大型语言模型(MLLMs)的评估方法,涵盖基准测试类型、基准构建流程、评估方法以及未来...
AI电台FM科技频道:多模态大型语言模型综述
12 Dec 2024
Contributed by Lukas
本期节目深入探讨多模态大型语言模型(MLLM)的最新进展,涵盖架构、训练策略、数据、评估方法以及未来发展方...
AI Radio FM - GLM-4-Voice: 人工智能语音聊天机器人
12 Dec 2024
Contributed by Lukas
深度探讨GLM-4-Voice,一款支持中英双语、具备实时语音对话能力,并根据用户指令调整语音语调、语速和方言等细微...
AI电台FM科技频道:视频扩散模型综述
10 Dec 2024
Contributed by Lukas
本期节目深入探讨了AI内容生成领域中视频扩散模型的最新进展,涵盖视频生成、编辑和理解三大方向。
AI电台FM科技频道:高分辨率图像合成与潜在扩散模型
10 Dec 2024
Contributed by Lukas
本期节目深入探讨高分辨率图像合成技术,特别是潜在扩散模型(LDM)的最新进展。我们将讨论LDM如何通过降低计...
AI Radio FM - 视觉Transformer:图像识别的革命
10 Dec 2024
Contributed by Lukas
深度探讨Vision Transformer (ViT) 如何颠覆图像识别领域,以及其在大型数据集上的卓越表现。
AI电台FM科技频道:Diffusion Transformers 革命性图像生成
10 Dec 2024
Contributed by Lukas
本期节目深入探讨Diffusion Transformers (DiTs) 如何在图像生成领域取得突破性进展,并超越现有U-Net模型。我们将分析D...
AI电台FM科技频道:变分自动编码器深度解析
10 Dec 2024
Contributed by Lukas
本期节目深入探讨变分自动编码器(Variational Autoencoder,VAE)的原理和应用,带你揭秘高效近似推断和学习的奥秘。...
AI电台FM科技频道:扩散模型设计基础详解
10 Dec 2024
Contributed by Lukas
本期节目深入探讨扩散模型的三个核心组件:前向过程、反向过程和采样过程,并分析各种设计选择及其影响。我...
AI电台FM科技频道:AGI之路上的里程碑
10 Dec 2024
Contributed by Lukas
探索人工智能通用智能(AGI)的等级框架,探讨其能力、风险和人机交互。
AI电台FM科技频道:HunyuanVideo大型视频生成模型深度解析
05 Dec 2024
Contributed by Lukas
欢迎收听AI电台FM科技频道,本期节目将深入探讨腾讯Hunyuuan团队最新发布的开源视频生成模型HunyuanVideo。我们将从...
AI Radio FM - Auto-RAG: 自动迭代检索的未来
04 Dec 2024
Contributed by Lukas
深度探讨Auto-RAG模型,揭秘其自主迭代检索的机制,以及在知识密集型任务中的卓越表现。
AI电台FM科技频道:生成式AI提示工程技术详解
21 Nov 2024
Contributed by Lukas
本期节目深入探讨生成式人工智能的提示工程技术,涵盖文本、图像、多模态等多种提示方法,并对提示工程的实...
AI电台FM科技频道:超小型多模态AI智能体Octopus v3技术详解
21 Nov 2024
Contributed by Lukas
本期节目深入探讨Octopus v3,一款参数小于10亿的、可在边缘设备上运行的多模态AI智能体。我们将与专家一起,从技...
AI Radio FM - Machete: Hopper GPU 优化 GEMM 内核
19 Nov 2024
Contributed by Lukas
深度探讨Neural Magic的Machete内核,专为NVIDIA Hopper GPU上的混合输入量化而优化,显著提升大型语言模型推理性能。
AI Radio FM - 深度学习GPU推荐
19 Nov 2024
Contributed by Lukas
探讨2023年最佳深度学习GPU,涵盖GPU架构、性能、性价比等多个方面。
AI 广播 FM - 科技频道: 精度缩放定律
14 Nov 2024
Contributed by Lukas
欢迎来到 AI 广播 FM - 科技频道,您的个人生成式 AI 播客。今天,我们将深入探讨一个关于大语言模型精度缩放定律...
AI Radio FM - Technology Channel: 神经网络中的知识蒸馏
13 Nov 2024
Contributed by Lukas
欢迎收听 AI Radio FM - Technology Channel,您的个人生成式 AI 播客!今天我们将深入探讨神经网络中知识蒸馏的奇妙世界...
AI电台FM - 科技频道:Moshi - 实时对话的语音-文本基础模型
10 Nov 2024
Contributed by Lukas
欢迎来到AI电台FM - 科技频道,您的个性化生成式AI播客。今天,我们将深入探讨Moshi,一个实时对话的语音-文本基...
HourVideo: 评估一小时视频语言理解能力的新基准数据集
10 Nov 2024
Contributed by Lukas
HourVideo 是一个新颖的基准数据集,旨在严格评估多模态模型对一小时视频的理解能力。该数据集包含一个新的任务...
AI 广播电台 - 科技频道:超越文本:为工业应用优化多模态 RAG
10 Nov 2024
Contributed by Lukas
欢迎收听 AI 广播电台 - 科技频道,您的个人生成式 AI 播客。今天,我们将讨论一篇关于利用多模态输入来优化 RA...
AI Radio FM - Technology Channel: LLM × MapReduce: Simplified Long-Sequence Processing Using Large Language Models
10 Nov 2024
Contributed by Lukas
在这个激动人心的 AI Radio FM - 技术频道播客中,我们将深入探讨一项名为 LLM × MapReduce 的突破性技术,该技术可以...
AI Radio FM - 技术频道:SVDQuant:低秩组件吸收异常值以实现 4 位扩散模型
10 Nov 2024
Contributed by Lukas
欢迎收听 AI Radio FM - 技术频道,您的个人生成式 AI 播客。今天,我们将探讨一篇关于使用低秩组件吸收异常值以实...
AI 广播电台 - 科技频道:混合Transformer模型,多模态基础模型的稀疏且可扩展架构
09 Nov 2024
Contributed by Lukas
欢迎收听 AI 广播电台 - 科技频道,您的个人生成式 AI 播客!在今天的节目中,我们将深入探讨一项关于混合Transfo...
AI Radio FM - Technology Channel: Hunyuan3D-1.0 的革命性3D生成框架
09 Nov 2024
Contributed by Lukas
欢迎收听 AI Radio FM - 科技频道,您的个人生成式 AI 播客!今天,我们深入探讨了腾讯 HunYuan3D-1.0 的最新突破,一个...
AI 广播电台 - 科技频道 - 开源 PDF 转码工具 Docling 深度解析
09 Nov 2024
Contributed by Lukas
欢迎收听 AI 广播电台 - 科技频道,您的个人生成式 AI 播客!今天,我们将深入探讨 Docling,一个开源的 PDF 文档转...
AI 广播电台 - 科技频道:探索 Mini-Omni2,开源的 GPT-4o 多模态语言模型
09 Nov 2024
Contributed by Lukas
欢迎收听 AI 广播电台 - 科技频道,您的个人生成式 AI 播客!今天我们将深入探讨一个激动人心的主题:Mini-Omni2,...
AI Radio FM - Technology Channel: StoryAgent - Customized Storytelling Video Generation via Multi-Agent Collaboration
09 Nov 2024
Contributed by Lukas
Welcome to AI Radio FM - Technology Channel, Your Personal Generative AI Podcast! Today, we're diving into the world of StoryAgent, a groundbrea...
AI 广播电台 - 科技频道:Freeze-Omni:一个智能且低延迟的语音到语音对话模型
09 Nov 2024
Contributed by Lukas
欢迎收听 AI 广播电台 - 科技频道,您的个人生成式 AI 播客!今天,我们将深入探讨 Freeze-Omni,这是一个拥有冻结 ...