Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Data Engineering Podcast

Technology Education

Episodes

Showing 101-200 of 494
«« ← Prev Page 2 of 5 Next → »»

Powering Vector Search With Real Time And Incremental Vector Indexes

25 Sep 2023

Contributed by Lukas

Summary The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare ...

Building Linked Data Products With JSON-LD

17 Sep 2023

Contributed by Lukas

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Li...

An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem

10 Sep 2023

Contributed by Lukas

Summary Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that co...

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

04 Sep 2023

Contributed by Lukas

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, w...

Building An Internal Database As A Service Platform At Cloudflare

28 Aug 2023

Contributed by Lukas

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted service...

Harnessing Generative AI For Creating Educational Content With Illumidesk

20 Aug 2023

Contributed by Lukas

Summary Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share th...

Unpacking The Seven Principles Of Modern Data Pipelines

14 Aug 2023

Contributed by Lukas

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up...

Quantifying The Return On Investment For Your Data Team

06 Aug 2023

Contributed by Lukas

Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are be...

Strategies For A Successful Data Platform Migration

31 Jul 2023

Contributed by Lukas

Summary All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for you...

Build Real Time Applications With Operational Simplicity Using Dozer

24 Jul 2023

Contributed by Lukas

Summary Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. Despite tha...

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

17 Jul 2023

Contributed by Lukas

Summary Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row s...

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

09 Jul 2023

Contributed by Lukas

Summary For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered qu...

How Data Engineering Teams Power Machine Learning With Feature Platforms

03 Jul 2023

Contributed by Lukas

Summary Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and proced...

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

25 Jun 2023

Contributed by Lukas

Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized i...

How Column-Aware Development Tooling Yields Better Data Models

18 Jun 2023

Contributed by Lukas

Summary Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the co...

Build Better Tests For Your dbt Projects With Datafold And data-diff

11 Jun 2023

Contributed by Lukas

Summary Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be ...

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

04 Jun 2023

Contributed by Lukas

Summary A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps h...

A Roadmap To Bootstrapping The Data Team At Your Startup

29 May 2023

Contributed by Lukas

Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probabl...

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

21 May 2023

Contributed by Lukas

Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argu...

What Happens When The Abstractions Leak On Your Data

15 May 2023

Contributed by Lukas

Summary All of the advancements in our technology is based around the principles of abstraction. These are valuable until they break down, which is...

Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

07 May 2023

Contributed by Lukas

Summary Every business has customers, and a critical element of success is understanding who they are and how they are using the companies products...

Realtime Data Applications Made Easier With Meroxa

24 Apr 2023

Contributed by Lukas

Summary Real-time capabilities have quickly become an expectation for consumers. The complexity of providing those capabilities is still high, howe...

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

16 Apr 2023

Contributed by Lukas

Summary Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and b...

An Exploration Of The Composable Customer Data Platform

10 Apr 2023

Contributed by Lukas

Summary The customer data platform is a category of services that was developed early in the evolution of the current era of cloud services for dat...

Mapping The Data Infrastructure Landscape As A Venture Capitalist

03 Apr 2023

Contributed by Lukas

Summary The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track...

Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite

25 Mar 2023

Contributed by Lukas

Summary The promise of streaming data is that it allows you to react to new information as it happens, rather than introducing latency by batching ...

Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed

19 Mar 2023

Contributed by Lukas

Summary As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes...

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

10 Mar 2023

Contributed by Lukas

Summary With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that...

Exploring The Nuances Of Building An Intentional Data Culture

06 Mar 2023

Contributed by Lukas

Summary The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope a...

Building A Data Mesh Platform At PayPal

27 Feb 2023

Contributed by Lukas

Summary There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Pe...

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

19 Feb 2023

Contributed by Lukas

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limit...

Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

11 Feb 2023

Contributed by Lukas

Summary Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has be...

Reflecting On The Past 6 Years Of Data Engineering

06 Feb 2023

Contributed by Lukas

Summary This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have ...

Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics

30 Jan 2023

Contributed by Lukas

Summary Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analyst...

Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI

22 Jan 2023

Contributed by Lukas

Summary The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in th...

Building Applications With Data As Code On The DataOS

16 Jan 2023

Contributed by Lukas

Summary The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. ...

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

08 Jan 2023

Contributed by Lukas

Summary Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. ...

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

29 Dec 2022

Contributed by Lukas

Summary Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organiza...

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

29 Dec 2022

Contributed by Lukas

Summary With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head ...

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

26 Dec 2022

Contributed by Lukas

Summary Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operati...

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

26 Dec 2022

Contributed by Lukas

Summary Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and pra...

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

19 Dec 2022

Contributed by Lukas

Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction i...

Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle

19 Dec 2022

Contributed by Lukas

Summary The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of th...

Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee

12 Dec 2022

Contributed by Lukas

Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learn...

Run Your Applications Worldwide Without Worrying About The Database With Planetscale

12 Dec 2022

Contributed by Lukas

Summary One of the most critical aspects of software projects is managing its data. Managing the operational concerns for your database can be comple...

Business Intelligence In The Palm Of Your Hand With Zing Data

05 Dec 2022

Contributed by Lukas

Summary Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is thro...

Adopting Real-Time Data At Organizations Of Every Size

05 Dec 2022

Contributed by Lukas

Summary The term "real-time data" brings with it a combination of excitement, uncertainty, and skepticism. The promise of insights that are...

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

28 Nov 2022

Contributed by Lukas

Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This...

Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase

28 Nov 2022

Contributed by Lukas

Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information...

A Look At The Data Systems Behind The Gameplay For League Of Legends

21 Nov 2022

Contributed by Lukas

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal bus...

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

21 Nov 2022

Contributed by Lukas

Summary The problems that are easiest to fix are the ones that you prevent from happening in the first place. Sifflet is a platform that brings your ...

Taking A Look Under The Hood At CreditKarma's Data Platform

14 Nov 2022

Contributed by Lukas

Summary CreditKarma builds data products that help consumers take advantage of their credit and financial capabilities. To make that possible they ne...

Build Data Products Without A Data Team Using AgileData

14 Nov 2022

Contributed by Lukas

Summary Building data products is an undertaking that has historically required substantial investments of time and talent. With the rise in cloud pl...

Build Better Data Products By Creating Data, Not Consuming It

07 Nov 2022

Contributed by Lukas

Summary A lot of the work that goes into data engineering is trying to make sense of the "data exhaust" from other applications and service...

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

07 Nov 2022

Contributed by Lukas

Summary Despite the best efforts of data engineers, data is as messy as the real world. Entity resolution and fuzzy matching are powerful utilities f...

Expanding The Reach of Business Intelligence Through Ubiquitous Embedded Analytics With Sisense

31 Oct 2022

Contributed by Lukas

Summary Business intelligence has grown beyond its initial manifestation as dashboards and reports. In its current incarnation it has become a ubiqui...

Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

30 Oct 2022

Contributed by Lukas

Summary One of the most impactful technologies for data analytics in recent years has been dbt. It’s hard to have a conversation about data eng...

How To Bring Agile Practices To Your Data Projects

23 Oct 2022

Contributed by Lukas

Summary Agile methodologies have been adopted by a majority of teams for building software applications. Applying those same practices to data can pr...

Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB

23 Oct 2022

Contributed by Lukas

Summary The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nea...

An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

16 Oct 2022

Contributed by Lukas

Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction suppo...

Speeding Up The Time To Insight For Supply Chains And Logistics With The Pathway Database That Thinks

16 Oct 2022

Contributed by Lukas

Summary Logistics and supply chains are under increased stress and scrutiny in recent years. In order to stay ahead of customer demands, businesses n...

Making The Open Data Lakehouse Affordable Without The Overhead At Iomete

10 Oct 2022

Contributed by Lukas

Summary The core of any data platform is the centralized storage and processing layer. For many that is a data warehouse, but in order to support a d...

Investing In Understanding The Customer Journey At American Express

10 Oct 2022

Contributed by Lukas

Summary For any business that wants to stay in operation, the most important thing they can do is understand their customers. American Express has in...

Gain Visibility And Insight Into Your Supply Chains Through Operational Analytics Powered By Roambee

03 Oct 2022

Contributed by Lukas

Summary The global economy is dependent on complex and dynamic networks of supply chains powered by sophisticated logistics. This requires a signific...

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

03 Oct 2022

Contributed by Lukas

Summary Data lineage is something that has grown from a convenient feature to a critical need as data systems have grown in scale, complexity, and ce...

Power Your Real-Time Analytics Without The Headache Using Fivetran's Change Data Capture Integrations

26 Sep 2022

Contributed by Lukas

Summary Data integration from source systems to their downstream destinations is the foundational step for any data product. With the increasing expe...

Build A Common Understanding Of Your Data Reliability Rules With Soda Core and Soda Checks Language

26 Sep 2022

Contributed by Lukas

Summary Regardless of how data is being used, it is critical that the information is trusted. The practice of data reliability engineering has gained...

Operational Analytics To Increase Efficiency For Multi-Location Businesses With OpsAnalitica

19 Sep 2022

Contributed by Lukas

Summary In order to improve efficiency in any business you must first know what is contributing to wasted effort or missed opportunities. When your b...

Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

19 Sep 2022

Contributed by Lukas

Summary There is a constant tension in business data between growing siloes, and breaking them down. Even when a tool is designed to integrate inform...

Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

12 Sep 2022

Contributed by Lukas

Summary Data engineering systems are complex and interconnected with myriad and often opaque chains of dependencies. As they scale, the problems of v...

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

12 Sep 2022

Contributed by Lukas

Summary Any business that wants to understand their operations and customers through data requires some form of pipeline. Building reliable data pipe...

A Reflection On Data Observability As It Reaches Broader Adoption

05 Sep 2022

Contributed by Lukas

Summary Data observability is a product category that has seen massive growth and adoption in recent years. Monte Carlo is in the vanguard of compani...

Introduce Climate Analytics Into Your Data Platform Without The Heavy Lifting Using Sust Global

05 Sep 2022

Contributed by Lukas

Summary The global climate impacts everyone, and the rate of change introduces many questions that businesses need to consider. Getting answers to th...

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

29 Aug 2022

Contributed by Lukas

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines ar...

Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

28 Aug 2022

Contributed by Lukas

Summary AirBnB pioneered a number of the organizational practices that have become the goal of modern data teams. Out of that culture a number of suc...

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

22 Aug 2022

Contributed by Lukas

Summary Data has permeated every aspect of our lives and the products that we interact with. As a result, end users and customers have come to expect...

Understanding The Role Of The Chief Data Officer

22 Aug 2022

Contributed by Lukas

Summary The position of Chief Data Officer (CDO) is relatively new in the business world and has not been universally adopted. As a result, not every...

Bringing Automation To Data Labeling For Machine Learning With Watchful

14 Aug 2022

Contributed by Lukas

Summary Data engineers have typically left the process of data labeling to data scientists or other roles because of its nature as a manual and proce...

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

14 Aug 2022

Contributed by Lukas

Summary Data is useless if it isn’t being used, and you can’t use it if you don’t know where it is. Data catalogs were the first so...

Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab

06 Aug 2022

Contributed by Lukas

Summary Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural patter...

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

06 Aug 2022

Contributed by Lukas

Summary The optimal format for storage and retrieval of data is dependent on how it is going to be used. For analytical systems there are decades of ...

Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda

31 Jul 2022

Contributed by Lukas

Summary Exploratory data analysis works best when the feedback loop is fast and iterative. This is easy to achieve when you are working on small data...

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

31 Jul 2022

Contributed by Lukas

Summary Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model,...

Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster

24 Jul 2022

Contributed by Lukas

Summary The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being...

Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering

24 Jul 2022

Contributed by Lukas

Summary Data engineering is a difficult job, requiring a large number of skills that often don’t overlap. Any effort to understand how to start...

Making The Total Cost Of Ownership For External Data Manageable With Crux

17 Jul 2022

Contributed by Lukas

Summary There are extensive and valuable data sets that are available outside the bounds of your organization. Whether that data is public, paid, or ...

Joe Reis Flips The Script And Interviews Tobias Macey About The Data Engineering Podcast

17 Jul 2022

Contributed by Lukas

Summary Data engineering is a large and growing subject, with new technologies, specializations, and "best practices" emerging at an accele...

Charting the Path of Riskified's Data Platform Journey

10 Jul 2022

Contributed by Lukas

Summary Building a data platform is a journey, not a destination. Beyond the work of assembling a set of technologies and building integrations acros...

Maintain Your Data Engineers' Sanity By Embracing Automation

10 Jul 2022

Contributed by Lukas

Summary Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to ...

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-diff

03 Jul 2022

Contributed by Lukas

Summary The perennial challenge of data engineers is ensuring that information is integrated reliably. While it is straightforward to know whether a ...

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

03 Jul 2022

Contributed by Lukas

Summary The ecosystem for data tools has been going through rapid and constant evolution over the past several years. These technological shifts have...

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

27 Jun 2022

Contributed by Lukas

Summary The proliferation of sensors and GPS devices has dramatically increased the number of applications for spatial data, and the need for scalabl...

Strategies And Tactics For A Successful Master Data Management Implementation

27 Jun 2022

Contributed by Lukas

Summary The most complicated part of data engineering is the effort involved in making the raw data fit into the narrative of the business. Master Da...

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

19 Jun 2022

Contributed by Lukas

Summary Data analysis is a valuable exercise that is often out of reach of non-technical users as a result of the complexity of data systems. In orde...

Level Up Your Data Platform With Active Metadata

19 Jun 2022

Contributed by Lukas

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. A variety of platforms have b...

Discover And De-Clutter Your Unstructured Data With Aparavi

13 Jun 2022

Contributed by Lukas

Summary Unstructured data takes many forms in an organization. From a data engineering perspective that often means things like JSON files, audio or ...

Hire And Scale Your Data Team With Intention

13 Jun 2022

Contributed by Lukas

Summary Building a well rounded and effective data team is an iterative process, and the first hire can set the stage for future success or failure. ...

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

06 Jun 2022

Contributed by Lukas

Summary The best way to make sure that you don’t leak sensitive data is to never have it in the first place. The team at Skyflow decided that t...

Bringing The Modern Data Stack To Everyone With Y42

06 Jun 2022

Contributed by Lukas

Summary Cloud services have made highly scalable and performant data platforms economical and manageable for data teams. However, they are still chal...

Data Cloud Cost Optimization With Bluesky Data

30 May 2022

Contributed by Lukas

Summary The latest generation of data warehouse platforms have brought unprecedented operational simplicity and effectively infinite scale. Along wit...

«« ← Prev Page 2 of 5 Next → »»