Data Engineering Podcast
Episodes
From Context to Semantics: How Metadata Powers Agentic AI
21 Dec 2025
Contributed by Lukas
Summary In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms are evolving from human-centric catalogs into t...
From Data Engineering to AI Engineering: Where the Lines Blur
14 Dec 2025
Contributed by Lukas
Summary In this solo episode of the Data Engineering Podcast, host Tobias Macey reflects on how AI has transformed the practice and pace of data ...
Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics
08 Dec 2025
Contributed by Lukas
Summary In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with data beyond SQL. Michael shares the origins of...
Blurring Lines: Data, AI, and the New Playbook for Team Velocity
24 Nov 2025
Contributed by Lukas
SummaryIn this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams bui...
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
16 Nov 2025
Contributed by Lukas
Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams b...
The AI Data Paradox: High Trust in Models, Low Trust in Data
09 Nov 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a...
Bridging the AI–Data Gap: Collect, Curate, Serve
02 Nov 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's ...
Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access
27 Oct 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials...
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies
18 Oct 2025
Contributed by Lukas
SummaryIn this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining l...
Context Engineering as a Discipline: Building Governed AI Analytics
11 Oct 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Comp...
The Data Model That Captures Your Business: Metric Trees Explained
05 Oct 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data ...
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
28 Sep 2025
Contributed by Lukas
SummaryIn this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing A...
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
18 Sep 2025
Contributed by Lukas
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transfo...
Duck Lake: Simplifying the Lakehouse Ecosystem
10 Sep 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a ...
Aligning Business and Data: The Essential Role of Data Modeling
01 Sep 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data ...
From Academia to Industry: Bridging Data Engineering Challenges
26 Aug 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge g...
High Performance And Low Overhead Graphs With KuzuDB
18 Aug 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth...
Bridging Data and Decision-Making: AI's Role in Modern Analytics
12 Aug 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomou...
From Bits to Tables: The Evolution of S3 Storage
05 Aug 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their int...
Revolutionizing Python Notebooks with Marimo
28 Jul 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offe...
Warehouse Native Incremental Data Processing With Dynamic Tables And Delayed View Semantics
21 Jul 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in war...
Streamlining Data Pipelines with MCP Servers and Vector Engines
15 Jul 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process uns...
Foundational Data Engineering At Two Sigma
06 Jul 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexiti...
Enabling Agents In The Enterprise With A Platform Approach
29 Jun 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with ...
Dagster's New Era: Modularizing Data Transformation in the Age of AI
18 Jun 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscap...
AI and the Lakehouse: How Starburst is Pioneering New Workflows
11 Jun 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with...
Amazon S3: The Backbone of Modern Data Systems
03 Jun 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazo...
Scaling Data Operations With Platform Engineering
29 May 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Chakravarthy Kotaru talks about scaling data operations through standardized platform offerings...
From Data Discovery to AI: The Evolution of Semantic Layers
21 May 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast, host Tobias Macy welcomes back Shinji Kim to discuss the evolving role of semantic layers in t...
Balancing Off-the-Shelf and Custom Solutions in Data Engineering
13 May 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Tulika Bhatt, a senior software engineer at Netflix, talks about her experiences with large-sca...
StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics
05 May 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Sida Shen, product manager at CelerData, talks about StarRocks, a high-performance analytical d...
Exploring NATS: A Multi-Paradigm Connectivity Layer for Distributed Applications
28 Apr 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Derek Collison, creator of NATS and CEO of Synadia, talks about the evolution and capabilities ...
Advanced Lakehouse Management With The LakeKeeper Iceberg REST Catalog
21 Apr 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Viktor Kessler, co-founder of Vakmo, talks about the architectural patterns in the lake house e...
Simplifying Data Pipelines with Durable Execution
12 Apr 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementin...
Overcoming Redis Limitations: The Dragonfly DB Approach
30 Mar 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Roman Gershman, CTO and founder of Dragonfly DB, explores the development and impact of high-sp...
Bringing AI Into The Inner Loop of Data Engineering With Ascend
24 Mar 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Sean Knapp, CEO of Ascend.io, explores the intersection of AI and data engineering. He discusse...
Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy
16 Mar 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Pete DeJoy, co-founder and product lead at Astronomer, talks about building and managing Airflo...
Accelerated Computing in Modern Data Centers With Datapelago
08 Mar 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Rajan Goyal, CEO and co-founder of Datapelago, talks about improving efficiencies in data proce...
The Future of Data Engineering: AI, LLMs, and Automation
26 Feb 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data en...
Evolving Responsibilities in AI Data Management
16 Feb 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Bartosz Mikulski talks about preparing data for AI applications. Bartosz shares his journey fro...
CSVs Will Never Die And OneSchema Is Counting On It
13 Jan 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shar...
Breaking Down Data Silos: AI and ML in Master Data Management
03 Jan 2025
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) ...
Building a Data Vision Board: A Guide to Strategic Planning
23 Dec 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management....
How Orchestration Impacts Data Platform Architecture
16 Dec 2024
Contributed by Lukas
SummaryThe core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedu...
An Exploration Of The Impediments To Reusable Data Pipelines
08 Dec 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about reusability in data pipelines. The conversation explo...
The Art of Database Selection and Evolution
01 Dec 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of databases in software engineering. Sam shares his ...
Bridging Code and UI in Data Orchestration with Kestra
26 Nov 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestratio...
Streaming Data Into The Lakehouse With Iceberg And Trino At Going
18 Nov 2024
Contributed by Lukas
In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino a...
An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin
11 Nov 2024
Contributed by Lukas
SummaryThe challenges of integrating all of the tools in the modern data stack has led to a new generation of tools that focus on a fully integrated w...
Feldera: Bridging Batch and Streaming with Incremental Computation
04 Nov 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous co...
Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent
27 Oct 2024
Contributed by Lukas
SummaryGleb Mezhanskiy, CEO and co-founder of DataFold, joins Tobias Macey to discuss the challenges and innovations in data migrations. Gleb shares h...
Bring Vector Search And Storage To The Data Lake With Lance
20 Oct 2024
Contributed by Lukas
SummaryThe rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines availab...
The Role of Python in Shaping the Future of Data Platforms with DLT
13 Oct 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guidin...
Build Your Data Transformations Faster And Safer With SDF
06 Oct 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast Lukas Schulte, co-founder and CEO of SDF, explores the development and capabilities of this fas...
Scaling Airbyte: Challenges and Milestones on the Road to 1.0
23 Sep 2024
Contributed by Lukas
SummaryAirbyte is one of the most prominent platforms for data movement. Over the past 4 years they have invested heavily in solutions for scaling the...
Enhancing Data Accessibility and Governance with Gravitino
01 Sep 2024
Contributed by Lukas
SummaryAs data architectures become more elaborate and the number of applications of data increases, it becomes increasingly challenging to locate and...
The Evolution of DataOps: Insights from DataKitchen's CEO
04 Aug 2024
Contributed by Lukas
SummaryIn this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission...
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management
28 Jul 2024
Contributed by Lukas
SummaryData contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns t...
How Generative AI Is Impacting Data Engineering Teams
21 Jul 2024
Contributed by Lukas
SummaryGenerative AI has rapidly gained adoption for numerous use cases. To support those applications, organizational data platforms need to add new ...
The Role of Product Managers in Data-Centric Organizations
13 Jul 2024
Contributed by Lukas
SummaryIn this episode Praveen Gujar, Director of Product at LinkedIn, talks about the intricacies of product management for data and analytical platf...
Neon: A Serverless And Developer Friendly Postgres
08 Jul 2024
Contributed by Lukas
SummaryPostgres is one of the most widely respected and liked database engines ever. To make it even easier to use for developers to use, Nikita Shamg...
Improve Data Quality Through Engineering Rigor And Business Engagement With Synq
30 Jun 2024
Contributed by Lukas
SummaryThis episode features an insightful conversation with Petr Janda, the CEO and founder of Synq. Petr shares his journey from being an engineer t...
Stitching Together Enterprise Analytics With Microsoft Fabric
23 Jun 2024
Contributed by Lukas
Summary Data lakehouse architectures have been gaining significant adoption. To accelerate adoption in the enterprise Microsoft has created the Fab...
Being Data Driven At Stripe With Trino And Iceberg
16 Jun 2024
Contributed by Lukas
Summary Stripe is a company that relies on data to power their products and business. To support that functionality they have invested in Trino and...
X-Ray Vision For Your Flink Stream Processing With Datorios
09 Jun 2024
Contributed by Lukas
Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines i...
Practical First Steps In Data Governance For Long Term Success
02 Jun 2024
Contributed by Lukas
Summary Modern businesses aspire to be data driven, and technologists enjoy working through the challenge of building data systems to support that ...
Data Migration Strategies For Large Scale Systems
27 May 2024
Contributed by Lukas
Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the dat...
Zenlytic Is Building You A Better Coworker With AI Agents
19 May 2024
Contributed by Lukas
Summary The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisi...
Release Management For Data Platform Services And Logic
12 May 2024
Contributed by Lukas
Summary Building a data platform is a substrantial engineering endeavor. Once it is running, the next challenge is figuring out how to address rele...
Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach
05 May 2024
Contributed by Lukas
SummaryArtificial intelligence has dominated the headlines for several months due to the successes of large language models. This has prompted numerou...
Build Your Second Brain One Piece At A Time
28 Apr 2024
Contributed by Lukas
SummaryGenerative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through ...
Making Email Better With AI At Shortwave
21 Apr 2024
Contributed by Lukas
Summary Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on maki...
Designing A Non-Relational Database Engine
14 Apr 2024
Contributed by Lukas
Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational en...
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer
07 Apr 2024
Contributed by Lukas
Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business ...
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary
31 Mar 2024
Contributed by Lukas
Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is...
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+
24 Mar 2024
Contributed by Lukas
Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building de...
Reconciling The Data In Your Databases With Datafold
17 Mar 2024
Contributed by Lukas
Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is ...
Version Your Data Lakehouse Like Your Software With Nessie
10 Mar 2024
Contributed by Lukas
Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges th...
When And How To Conduct An AI Program
03 Mar 2024
Contributed by Lukas
Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a ...
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development
25 Feb 2024
Contributed by Lukas
Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and develo...
Using Trino And Iceberg As The Foundation Of Your Data Lakehouse
18 Feb 2024
Contributed by Lukas
Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user...
Data Sharing Across Business And Platform Boundaries
11 Feb 2024
Contributed by Lukas
Summary Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to...
Tackling Real Time Streaming Data With SQL Using RisingWave
04 Feb 2024
Contributed by Lukas
Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave...
Build A Data Lake For Your Security Logs With Scanner
29 Jan 2024
Contributed by Lukas
Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. Th...
Modern Customer Data Platform Principles
22 Jan 2024
Contributed by Lukas
Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed...
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel
07 Jan 2024
Contributed by Lukas
Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that...
Designing Data Platforms For Fintech Companies
01 Jan 2024
Contributed by Lukas
Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In...
Troubleshooting Kafka In Production
24 Dec 2023
Contributed by Lukas
Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it ...
Adding An Easy Mode For The Modern Data Stack With 5X
18 Dec 2023
Contributed by Lukas
Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools fo...
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack
11 Dec 2023
Contributed by Lukas
Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers f...
Designing Data Transfer Systems That Scale
04 Dec 2023
Contributed by Lukas
Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfe...
Addressing The Challenges Of Component Integration In Data Platform Architectures
27 Nov 2023
Contributed by Lukas
Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities...
Unlocking Your dbt Projects With Practical Advice For Practitioners
20 Nov 2023
Contributed by Lukas
Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many po...
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine
13 Nov 2023
Contributed by Lukas
Summary Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of...
Shining Some Light In The Black Box Of PostgreSQL Performance
06 Nov 2023
Contributed by Lukas
Summary Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a...
Surveying The Market Of Database Products
30 Oct 2023
Contributed by Lukas
Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has ex...
Defining A Strategy For Your Data Products
23 Oct 2023
Contributed by Lukas
Summary The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable...
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable
15 Oct 2023
Contributed by Lukas
Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challe...
Using Data To Illuminate The Intentionally Opaque Insurance Industry
09 Oct 2023
Contributed by Lukas
Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a bu...
Building ETL Pipelines With Generative AI
01 Oct 2023
Contributed by Lukas
Summary Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reache...