AI Podcast

LLaVA-OneVision: 易于实现的视觉任务迁移

09 Feb 2025

Contributed by Lukas

探讨 LLaVA-OneVision，一个开源的大型多模态模型家族，通过整合 LLaVA-NeXT 博客系列中的数据、模型和视觉表示方面的...

VITA-1.5：迈向GPT-4o水平的实时视觉和语音交互

09 Feb 2025

Contributed by Lukas

本期播客深入探讨VITA-1.5，一个旨在实现实时视觉和语音交互的多模态大型语言模型。我们将讨论其架构、训练策略...

Hibiki: 高保真同步语音到语音翻译

08 Feb 2025

Contributed by Lukas

本播客深入探讨了 Hibiki，一种用于同步语音翻译的创新解码器模型。我们将讨论其架构、训练方法以及在法语-英语...

Kimi k1.5: 基于强化学习的大语言模型扩展

08 Feb 2025

Contributed by Lukas

本播客深入探讨了 Kimi 团队如何利用强化学习 (RL) 训练其最新的多模态大语言模型 Kimi k1.5。内容涵盖 RL 训练技术、...

Omni-Emotion：通过详细的面部和音频建模扩展视频 MLLM 以进行多模态情感分析

07 Feb 2025

Contributed by Lukas

本播客讨论了Omni-Emotion模型，该模型通过集成音频和细粒度面部信息来增强视频多模态大型语言模型（MLLM），从而...

HumanOmni：以人为中心的视频理解大型视觉语音语言模型

07 Feb 2025

Contributed by Lukas

深入探讨HumanOmni，一个为理解以人为中心的场景而设计的多模态大型语言模型。我们讨论了其数据集构建、模型架...

Align-Anything: 多模态模型训练与语言反馈

06 Feb 2025

Contributed by Lukas

本播客讨论了一种名为 Align-Anything 的新框架，该框架旨在通过利用人类反馈，尤其是语言反馈，来提升多模态模型...

OmniHuman: 混合条件的人体动画模型

06 Feb 2025

Contributed by Lukas

探讨OmniHuman，一种基于Diffusion Transformer的框架，通过混合运动相关条件来扩展数据，实现高度逼真的人体视频生成...

Scaling LLM Test-Time Compute Optimally

01 Feb 2025

Contributed by Lukas

A podcast discussing how to optimize the use of test-time computation for large language models (LLMs), focusing on strategies like searching against ...

LLM Test-Time Compute Scaling: An In-Depth Analysis

01 Feb 2025

Contributed by Lukas

A podcast discussing how to optimally scale test-time compute for Large Language Models (LLMs), focusing on improving both verifiers and the model&#x2...

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

28 Jan 2025

Contributed by Lukas

Discussing the Janus-Pro multimodal model.

JanusFlow: 统一多模态理解与生成框架

28 Jan 2025

Contributed by Lukas

这是一个关于JanusFlow的播客，JanusFlow是一种强大的框架，它将图像理解和生成统一到一个模型中，通过整合自回归...

AI科技前沿：Janus多模态统一框架解析

28 Jan 2025

Contributed by Lukas

欢迎来到AI Radio FM - 科技频道，您的专属生成式AI播客！今天，我们将深入探讨一项名为Janus的创新多模态框架。Jan...

Hunyuan3D 2.0: 高分辨率纹理3D资产生成的扩散模型

24 Jan 2025

Contributed by Lukas

本播客讨论Hunyuan3D 2.0，这是一个用于生成高分辨率纹理3D资产的先进大规模3D合成系统。该系统包括两个基础组件：...

DeepSeek-R1：通过强化学习激励大型语言模型的推理能力

24 Jan 2025

Contributed by Lukas

本播客深入探讨DeepSeek-R1模型，该模型通过大规模强化学习显著提升了大型语言模型的推理能力。我们将分析DeepSee...

DistServe：面向高吞吐量的大型语言模型服务的分离式预填充和解码

23 Jan 2025

Contributed by Lukas

本播客讨论了DistServe，一种通过分离预填充和解码计算来提高大型语言模型（LLM）服务性能的系统。我们深入探讨...

大规模Transformer模型推理的效率优化

20 Jan 2025

Contributed by Lukas

本播客深入探讨了如何高效地部署大型Transformer模型进行生成式推理，特别是在延迟敏感和长序列长度的场景下。我...

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

19 Jan 2025

Contributed by Lukas

A podcast discussing the research paper 'Addressing Representation Collapse in Vector Quantized Models with One Linear Layer'

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

19 Jan 2025

Contributed by Lukas

A podcast discussing a new approach to audio-driven portrait animation called Sonic, focusing on global audio perception rather than visual cues. The ...

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

18 Jan 2025

Contributed by Lukas

A podcast discussing Titans: Learning to Memorize at Test Time.

AI Radio FM - Technology Channel

18 Jan 2025

Contributed by Lukas

A podcast discussing Tensor Product Attention, a novel attention mechanism for Large Language Models.

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

16 Jan 2025

Contributed by Lukas

A podcast discussing Classifier-Free Diffusion Guidance, a method for improving sample quality in diffusion models without relying on a separate class...

MinMo：多模态大型语言模型，实现无缝语音交互

15 Jan 2025

Contributed by Lukas

本播客深入探讨了阿里巴巴 Tongyi Lab 的 MinMo 模型，这是一种旨在实现无缝语音交互的多模态大型语言模型。我们讨...

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

15 Jan 2025

Contributed by Lukas

A podcast discussing the paper Titans: Learning to Memorize at Test Time.

iSTFTNet: 快速轻量级梅尔频谱声码器

09 Jan 2025

Contributed by Lukas

探讨iSTFTNet如何通过逆短时傅里叶变换优化梅尔频谱声码器，提高速度和效率。

AI技术前沿：Phi-4 大型语言模型的突破

09 Jan 2025

Contributed by Lukas

深入探讨微软最新发布的Phi-4大型语言模型，了解其在数据质量、合成数据、训练方法和后训练优化方面的创新。 ...

StyleTTS 2: Towards Human-Level Text-to-Speech

09 Jan 2025

Contributed by Lukas

A podcast discussion about the StyleTTS 2 model for text-to-speech synthesis, focusing on its innovative use of style diffusion and adversarial traini...

WavChat：语音对话模型调查

09 Jan 2025

Contributed by Lukas

本播客深入探讨了语音对话模型的最新进展，包括其功能、表示形式、训练范式以及流媒体和交互能力。

宇宙世界基础模型平台：物理人工智能的未来

07 Jan 2025

Contributed by Lukas

深入探讨NVIDIA Cosmos世界基础模型平台，该平台旨在促进物理人工智能的发展，通过数字孪生和世界模型，加速人工...

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

07 Jan 2025

Contributed by Lukas

A podcast discussing the Story-Adapter framework for long story visualization.

AI科技前沿：故事扩散模型深度解析

07 Jan 2025

Contributed by Lukas

本期播客深入探讨故事扩散模型，一种用于生成连贯图像和视频的新方法。我们将详细分析其核心技术，包括一致...

智能格林：基于潜在扩散模型的开放式视觉故事讲述

07 Jan 2025

Contributed by Lukas

本期播客讨论了一篇关于使用潜在扩散模型进行开放式视觉故事讲述的论文。我们深入探讨了该模型的技术细节、...

AI Radio FM - Technology Channel

07 Jan 2025

Contributed by Lukas

A podcast discussing the IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

07 Jan 2025

Contributed by Lukas

A podcast discussion about PowerInfer-2, a framework for running large language models on smartphones, focusing on its neuron cluster design, adaptive...

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

06 Jan 2025

Contributed by Lukas

A deep dive into LatentSync, an innovative lip-sync framework using audio-conditioned latent diffusion models, its methodology, experiments, and the r...

Infinity: Scaling Bitwise Autoregressive Modeling for High-Resolution Image Synthesis

06 Jan 2025

Contributed by Lukas

A podcast discussing the groundbreaking research paper on Infinity, a novel autoregressive model for high-resolution image synthesis.

AI驱动的交互式头部生成

06 Jan 2025

Contributed by Lukas

本期播客深入探讨了INFP，一个用于双人对话的音频驱动的头部生成框架。我们将探讨其创新方法、数据集以及实验...

CosyVoice 2: 使用大型语言模型实现可扩展的流式语音合成

06 Jan 2025

Contributed by Lukas

一个关于 CosyVoice 2 的播客，这是一个改进的流式语音合成模型，它利用大型语言模型，实现了接近人类水平的自然...

Flow Matching for Generative Modeling

06 Jan 2025

Contributed by Lukas

A podcast discussing the new paradigm for generative modeling using Continuous Normalizing Flows (CNFs) called Flow Matching (FM). FM offers a simulat...

Swin Transformer: A New Vision Transformer

05 Jan 2025

Contributed by Lukas

A podcast discussing the Swin Transformer, a hierarchical vision transformer using shifted windows for computer vision tasks.

ConvNeXt: A Modern ConvNet for the 2020s

05 Jan 2025

Contributed by Lukas

A podcast discussing the architecture and performance of ConvNeXt, a modern ConvNet model that challenges the dominance of Vision Transformers.

AI Vision Podcast: Masked Autoencoders for Scalable Vision Learning

05 Jan 2025

Contributed by Lukas

A deep dive into Masked Autoencoders (MAE) and their impact on computer vision, discussing their architecture, training efficiency, and performance on...

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

04 Jan 2025

Contributed by Lukas

A podcast discussing the auxiliary-loss-free load balancing strategy for mixture-of-experts models.

混合专家模型（MoE）技术综述

04 Jan 2025

Contributed by Lukas

本播客深入探讨了混合专家模型（MoE）的最新进展、算法设计、系统实现以及实际应用。从稀疏和密集MoE的背景知...

零气泡流水线并行

04 Jan 2025

Contributed by Lukas

本期播客深入探讨了零气泡流水线并行技术，这是一种旨在提高大规模分布式训练效率的创新方法。我们分析了传...

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

04 Jan 2025

Contributed by Lukas

A podcast discussion about GShard, a module for scaling neural networks using conditional computation and automatic sharding, focusing on its applicat...

AI Radio FM - Technology Channel: GShard and Giant Models

04 Jan 2025

Contributed by Lukas

A deep dive into GShard, a module for scaling giant neural networks, focusing on its application to multilingual machine translation and its impact on...

混合张量专家数据并行方法优化混合专家训练

04 Jan 2025

Contributed by Lukas

深入探讨 DeepSpeed-TED，一种新颖的三维混合并行框架，用于训练具有大型基础模型的混合专家模型。我们讨论了内存...

统一序列并行方法：为长上下文生成式AI赋能

04 Jan 2025

Contributed by Lukas

本播客深入探讨了统一序列并行（Unified Sequence Parallelism，简称USP）方法，这是一种用于训练具有极长上下文的生成...

LoongTrain: 高效长序列大语言模型训练

04 Jan 2025

Contributed by Lukas

本期播客深入探讨LoongTrain，一个为长序列大语言模型设计的高效训练框架。我们将讨论其核心的2D注意力机制，以...

Ring Attention with Blockwise Transformers for Near-Infinite Context

04 Jan 2025

Contributed by Lukas

A podcast discussing a novel approach to scale transformer models to handle near-infinite context lengths.

FlashAttention-3: Revolutionizing Attention Mechanisms on GPUs

04 Jan 2025

Contributed by Lukas

A podcast discussing the FlashAttention-3 algorithm, its improvements over previous versions, and its impact on large language models.

AI FlashAttention-2 Podcast

04 Jan 2025

Contributed by Lukas

A fast-paced discussion on FlashAttention-2, a faster attention mechanism for Transformers, exploring its algorithms, parallelism, and performance ben...

FlashAttention: 高效且内存优化的精确注意力机制

04 Jan 2025

Contributed by Lukas

探讨 FlashAttention 算法，一种在 GPU 上实现快速、内存高效精确注意力机制的新方法。深入分析其 IO 复杂度，并与现...

DeepSpeed Ulysses: 极端长序列Transformer模型训练的系统优化

04 Jan 2025

Contributed by Lukas

本播客深入探讨了DeepSpeed Ulysses，一种用于训练具有极长序列长度的Transformer模型的创新方法，它通过优化序列并行...

DistFlashAttn: 分布式长文本大语言模型训练的内存高效注意力机制

04 Jan 2025

Contributed by Lukas

本播客深入探讨 DistFlashAttn，一种专为长文本大语言模型训练设计的分布式内存高效注意力机制，详细解析其核心技...

大型Transformer模型中减少激活重计算

04 Jan 2025

Contributed by Lukas

本播客讨论了一种加速大型Transformer模型训练的新方法，通过减少激活重计算来实现。我们将深入探讨序列并行和选...

序列并行：从系统角度进行长序列训练

04 Jan 2025

Contributed by Lukas

探讨一种名为“序列并行”的内存高效并行方法，该方法旨在突破输入序列长度的限制，并能在GPU上高效训练更长...

AI驱动的大规模语言模型训练：Megatron-LM在GPU集群上的高效实践

04 Jan 2025

Contributed by Lukas

本期播客深入探讨了如何使用Megatron-LM在GPU集群上高效训练大规模语言模型，重点关注张量并行、流水线并行和数据...

AI Radio FM - Technology Channel: PagedAttention for Large Language Model Serving

04 Jan 2025

Contributed by Lukas

A podcast discussing PagedAttention, a novel memory management technique for serving large language models, and its implementation in vLLM.

ORCA: 分布式Transformer生成模型服务系统

04 Jan 2025

Contributed by Lukas

本期播客深入探讨了ORCA，一个为Transformer模型设计的分布式服务系统。我们将详细介绍其创新的迭代级调度和选择...

LLM推理优化：连续批处理实现23倍吞吐量提升

04 Jan 2025

Contributed by Lukas

本期播客深入探讨了大型语言模型（LLM）推理中的连续批处理技术，揭示了其如何显著提高吞吐量并降低延迟。我...

Mooncake：一种以KVCache为中心的LLM服务解耦架构

04 Jan 2025

Contributed by Lukas

本播客深入探讨Mooncake的创新架构，这是一种专为高效服务大型语言模型而设计的解耦系统。

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

02 Jan 2025

Contributed by Lukas

A podcast discussing the InternLM-XComposer2 model, its architecture, and capabilities in free-form text-image composition and comprehension.

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

02 Jan 2025

Contributed by Lukas

A fast-paced, enthusiastic podcast discussing the latest advancements in AI, focusing on the InternLM-XComposer2-4KHD model.

AI Radio FM - Technology Channel

02 Jan 2025

Contributed by Lukas

A podcast discussing InternLM-XComposer-2.5, a versatile large vision language model.

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

02 Jan 2025

Contributed by Lukas

A podcast discussing the InternLM-XComposer2.5-OmniLive system, a novel multimodal AI for long-term video and audio interaction.

DeepSeekMoE: 超越专家混合模型的终极专业化

28 Dec 2024

Contributed by Lukas

本期播客深入探讨了DeepSeekMoE这一创新的混合专家模型架构，旨在实现专家知识的终极专业化。我们将讨论其核心策...

DeepSeek-V3: A Deep Dive into a Powerful Mixture-of-Experts Model

27 Dec 2024

Contributed by Lukas

A podcast discussion analyzing the DeepSeek-V3 technical report, covering its architecture, training, and performance.

E2 TTS: 令人惊讶的简单零样本文本到语音技术

27 Dec 2024

Contributed by Lukas

本期节目深入探讨了E2 TTS，一种完全非自回归的零样本文本到语音系统，它在自然度、说话人相似度和可懂度方面...

BigVGAN: 通用神经声码器大规模训练

27 Dec 2024

Contributed by Lukas

本播客讨论了BigVGAN，一种通用的神经声码器，它通过大规模训练实现高保真音频合成，并在各种分布外场景中表现...

F5-TTS: 突破性文本到语音技术

27 Dec 2024

Contributed by Lukas

深入探讨 F5-TTS，一种基于流匹配的非自回归文本到语音系统，该系统在零样本语音合成方面表现出色。

深入浅出：注意力机制的演变与应用

25 Dec 2024

Contributed by Lukas

本期播客将深入探讨注意力机制在深度学习领域的演变与应用，从Seq2Seq模型的局限性到Transformer的创新，再到Self-A...

Speech and Language Processing

24 Dec 2024

Contributed by Lukas

A podcast discussing the content from Daniel Jurafsky and James H. Martin's "Speech and Language Processing" textbook, Third Edition dr...

从慢速双向到快速因果视频生成器

23 Dec 2024

Contributed by Lukas

本播客讨论了一种新的视频生成方法，该方法通过将预训练的双向扩散模型转化为因果模型，并结合分布匹配蒸馏...

PaliGemma 2: A Versatile Vision-Language Model

21 Dec 2024

Contributed by Lukas

A podcast discussion about PaliGemma 2, a family of versatile vision-language models, and its capabilities in various tasks.

Byte Latent Transformer: Patches Scale Better Than Tokens

21 Dec 2024

Contributed by Lukas

A podcast discussing the Byte Latent Transformer (BLT), a novel byte-level LLM architecture that matches tokenization-based LLM performance with impro...

AI前沿：POINTS1.5视觉语言模型深度解析

21 Dec 2024

Contributed by Lukas

本期播客深入探讨了腾讯微信AI团队推出的最新视觉语言模型POINTS1.5，从模型架构、双语支持到训练策略，全面解析...

ModernBERT: A Deep Dive into Efficient Encoder Models

21 Dec 2024

Contributed by Lukas

An in-depth discussion of the ModernBERT paper, exploring its architecture, training methodology, and performance across various NLP tasks.

Open-Sora Plan: 开源大型视频生成模型

15 Dec 2024

Contributed by Lukas

本播客深入探讨了Open-Sora Plan，一个旨在生成高质量、长时视频的开源项目。我们将详细分析其核心模型、辅助策略...

Wavelet Flow VAE for Latent Video Diffusion Models

15 Dec 2024

Contributed by Lukas

A podcast discussion about Wavelet Flow VAE (WF-VAE), a novel autoencoder that leverages multi-level wavelet transforms to enhance video encoding effi...

AI 广播 FM - 技术频道，您的个人生成式人工智能播客

14 Dec 2024

Contributed by Lukas

本期节目我们深入探讨了 POINTS1.5，一个在真实世界应用中表现出色的视觉语言模型。

Gemini 2.0 Flash for Developers

12 Dec 2024

Contributed by Lukas

A podcast discussing the new Gemini 2.0 Flash model and its capabilities for developers.

Gemini 2.0 新纪元：智能体时代的到来

12 Dec 2024

Contributed by Lukas

谷歌推出了 Gemini 2.0，一款为智能体时代打造的全新 AI 模型。本次播客将深入探讨 Gemini 2.0 的特性、应用以及谷歌...

AI电台FM科技频道：生成对抗网络GANs深度解析

12 Dec 2024

Contributed by Lukas

本期节目深入探讨Ian Goodfellow等人在2014年提出的生成对抗网络（GANs），揭秘其原理、优势及应用，带你领略AI领域...

AI电台FM科技频道：多模态大型语言模型评估的全面综述

12 Dec 2024

Contributed by Lukas

本期节目深入探讨多模态大型语言模型（MLLMs）的评估方法，涵盖基准测试类型、基准构建流程、评估方法以及未来...

AI电台FM科技频道：多模态大型语言模型综述

12 Dec 2024

Contributed by Lukas

本期节目深入探讨多模态大型语言模型（MLLM）的最新进展，涵盖架构、训练策略、数据、评估方法以及未来发展方...

AI Radio FM - GLM-4-Voice: 人工智能语音聊天机器人

12 Dec 2024

Contributed by Lukas

深度探讨GLM-4-Voice，一款支持中英双语、具备实时语音对话能力，并根据用户指令调整语音语调、语速和方言等细微...

AI电台FM科技频道：视频扩散模型综述

10 Dec 2024

Contributed by Lukas

本期节目深入探讨了AI内容生成领域中视频扩散模型的最新进展，涵盖视频生成、编辑和理解三大方向。

AI电台FM科技频道：高分辨率图像合成与潜在扩散模型

10 Dec 2024

Contributed by Lukas

本期节目深入探讨高分辨率图像合成技术，特别是潜在扩散模型（LDM）的最新进展。我们将讨论LDM如何通过降低计...

AI Radio FM - 视觉Transformer：图像识别的革命

10 Dec 2024

Contributed by Lukas

深度探讨Vision Transformer (ViT) 如何颠覆图像识别领域，以及其在大型数据集上的卓越表现。

AI电台FM科技频道：Diffusion Transformers 革命性图像生成

10 Dec 2024

Contributed by Lukas

本期节目深入探讨Diffusion Transformers (DiTs) 如何在图像生成领域取得突破性进展，并超越现有U-Net模型。我们将分析D...

AI电台FM科技频道：变分自动编码器深度解析

10 Dec 2024

Contributed by Lukas

本期节目深入探讨变分自动编码器（Variational Autoencoder，VAE）的原理和应用，带你揭秘高效近似推断和学习的奥秘。...

AI电台FM科技频道：扩散模型设计基础详解

10 Dec 2024

Contributed by Lukas

本期节目深入探讨扩散模型的三个核心组件：前向过程、反向过程和采样过程，并分析各种设计选择及其影响。我...

AI电台FM科技频道：AGI之路上的里程碑

10 Dec 2024

Contributed by Lukas

探索人工智能通用智能（AGI）的等级框架，探讨其能力、风险和人机交互。

AI电台FM科技频道：HunyuanVideo大型视频生成模型深度解析

05 Dec 2024

Contributed by Lukas

欢迎收听AI电台FM科技频道，本期节目将深入探讨腾讯Hunyuuan团队最新发布的开源视频生成模型HunyuanVideo。我们将从...

AI Radio FM - Auto-RAG: 自动迭代检索的未来

04 Dec 2024

Contributed by Lukas

深度探讨Auto-RAG模型，揭秘其自主迭代检索的机制，以及在知识密集型任务中的卓越表现。

AI电台FM科技频道：生成式AI提示工程技术详解

21 Nov 2024

Contributed by Lukas

本期节目深入探讨生成式人工智能的提示工程技术，涵盖文本、图像、多模态等多种提示方法，并对提示工程的实...

AI电台FM科技频道：超小型多模态AI智能体Octopus v3技术详解

21 Nov 2024

Contributed by Lukas

本期节目深入探讨Octopus v3，一款参数小于10亿的、可在边缘设备上运行的多模态AI智能体。我们将与专家一起，从技...

AI Radio FM - Machete: Hopper GPU 优化 GEMM 内核

19 Nov 2024

Contributed by Lukas

深度探讨Neural Magic的Machete内核，专为NVIDIA Hopper GPU上的混合输入量化而优化，显著提升大型语言模型推理性能。

Activity Overview

Episodes

LLaVA-OneVision: 易于实现的视觉任务迁移

VITA-1.5：迈向GPT-4o水平的实时视觉和语音交互

Hibiki: 高保真同步语音到语音翻译

Kimi k1.5: 基于强化学习的大语言模型扩展

Omni-Emotion：通过详细的面部和音频建模扩展视频 MLLM 以进行多模态情感分析

HumanOmni：以人为中心的视频理解大型视觉语音语言模型

Align-Anything: 多模态模型训练与语言反馈

OmniHuman: 混合条件的人体动画模型

Scaling LLM Test-Time Compute Optimally

LLM Test-Time Compute Scaling: An In-Depth Analysis

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

JanusFlow: 统一多模态理解与生成框架

AI科技前沿：Janus多模态统一框架解析

Hunyuan3D 2.0: 高分辨率纹理3D资产生成的扩散模型

DeepSeek-R1：通过强化学习激励大型语言模型的推理能力

DistServe：面向高吞吐量的大型语言模型服务的分离式预填充和解码

大规模Transformer模型推理的效率优化

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

AI Radio FM - Technology Channel

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

MinMo：多模态大型语言模型，实现无缝语音交互

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

iSTFTNet: 快速轻量级梅尔频谱声码器

AI技术前沿：Phi-4 大型语言模型的突破

StyleTTS 2: Towards Human-Level Text-to-Speech

WavChat：语音对话模型调查

宇宙世界基础模型平台：物理人工智能的未来

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

AI科技前沿：故事扩散模型深度解析

智能格林：基于潜在扩散模型的开放式视觉故事讲述

AI Radio FM - Technology Channel

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

Infinity: Scaling Bitwise Autoregressive Modeling for High-Resolution Image Synthesis

AI驱动的交互式头部生成

CosyVoice 2: 使用大型语言模型实现可扩展的流式语音合成

Flow Matching for Generative Modeling

Swin Transformer: A New Vision Transformer

ConvNeXt: A Modern ConvNet for the 2020s

AI Vision Podcast: Masked Autoencoders for Scalable Vision Learning

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

混合专家模型（MoE）技术综述

零气泡流水线并行

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

AI Radio FM - Technology Channel: GShard and Giant Models

混合张量专家数据并行方法优化混合专家训练

统一序列并行方法：为长上下文生成式AI赋能

LoongTrain: 高效长序列大语言模型训练

Ring Attention with Blockwise Transformers for Near-Infinite Context

FlashAttention-3: Revolutionizing Attention Mechanisms on GPUs

AI FlashAttention-2 Podcast

FlashAttention: 高效且内存优化的精确注意力机制

DeepSpeed Ulysses: 极端长序列Transformer模型训练的系统优化

DistFlashAttn: 分布式长文本大语言模型训练的内存高效注意力机制

大型Transformer模型中减少激活重计算

序列并行：从系统角度进行长序列训练

AI驱动的大规模语言模型训练：Megatron-LM在GPU集群上的高效实践

AI Radio FM - Technology Channel: PagedAttention for Large Language Model Serving

ORCA: 分布式Transformer生成模型服务系统

LLM推理优化：连续批处理实现23倍吞吐量提升

Mooncake：一种以KVCache为中心的LLM服务解耦架构

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

AI Radio FM - Technology Channel, Your Personal Generative AI Podcast

AI Radio FM - Technology Channel

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

DeepSeekMoE: 超越专家混合模型的终极专业化

DeepSeek-V3: A Deep Dive into a Powerful Mixture-of-Experts Model

E2 TTS: 令人惊讶的简单零样本文本到语音技术

BigVGAN: 通用神经声码器大规模训练

F5-TTS: 突破性文本到语音技术

深入浅出：注意力机制的演变与应用

Speech and Language Processing

从慢速双向到快速因果视频生成器

PaliGemma 2: A Versatile Vision-Language Model

Byte Latent Transformer: Patches Scale Better Than Tokens