Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

Data Engineering Podcast

Technology Education

Episodes

Showing 101-200 of 508
«« ← Prev Page 2 of 6 Next → »»

Designing Data Platforms For Fintech Companies

01 Jan 2024

Contributed by Lukas

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In...

Troubleshooting Kafka In Production

24 Dec 2023

Contributed by Lukas

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it ...

Adding An Easy Mode For The Modern Data Stack With 5X

18 Dec 2023

Contributed by Lukas

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools fo...

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

11 Dec 2023

Contributed by Lukas

Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers f...

Designing Data Transfer Systems That Scale

04 Dec 2023

Contributed by Lukas

Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfe...

Addressing The Challenges Of Component Integration In Data Platform Architectures

27 Nov 2023

Contributed by Lukas

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities...

Unlocking Your dbt Projects With Practical Advice For Practitioners

20 Nov 2023

Contributed by Lukas

Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many po...

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

13 Nov 2023

Contributed by Lukas

Summary Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of...

Shining Some Light In The Black Box Of PostgreSQL Performance

06 Nov 2023

Contributed by Lukas

Summary Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a...

Surveying The Market Of Database Products

30 Oct 2023

Contributed by Lukas

Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has ex...

Defining A Strategy For Your Data Products

23 Oct 2023

Contributed by Lukas

Summary The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable...

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

15 Oct 2023

Contributed by Lukas

Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challe...

Using Data To Illuminate The Intentionally Opaque Insurance Industry

09 Oct 2023

Contributed by Lukas

Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a bu...

Building ETL Pipelines With Generative AI

01 Oct 2023

Contributed by Lukas

Summary Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reache...

Powering Vector Search With Real Time And Incremental Vector Indexes

25 Sep 2023

Contributed by Lukas

Summary The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare ...

Building Linked Data Products With JSON-LD

17 Sep 2023

Contributed by Lukas

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Li...

An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem

10 Sep 2023

Contributed by Lukas

Summary Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that co...

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

04 Sep 2023

Contributed by Lukas

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, w...

Building An Internal Database As A Service Platform At Cloudflare

28 Aug 2023

Contributed by Lukas

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted service...

Harnessing Generative AI For Creating Educational Content With Illumidesk

20 Aug 2023

Contributed by Lukas

Summary Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share th...

Unpacking The Seven Principles Of Modern Data Pipelines

14 Aug 2023

Contributed by Lukas

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up...

Quantifying The Return On Investment For Your Data Team

06 Aug 2023

Contributed by Lukas

Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are be...

Strategies For A Successful Data Platform Migration

31 Jul 2023

Contributed by Lukas

Summary All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for you...

Build Real Time Applications With Operational Simplicity Using Dozer

24 Jul 2023

Contributed by Lukas

Summary Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. Despite tha...

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

17 Jul 2023

Contributed by Lukas

Summary Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row s...

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

09 Jul 2023

Contributed by Lukas

Summary For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered qu...

How Data Engineering Teams Power Machine Learning With Feature Platforms

03 Jul 2023

Contributed by Lukas

Summary Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and proced...

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

25 Jun 2023

Contributed by Lukas

Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized i...

How Column-Aware Development Tooling Yields Better Data Models

18 Jun 2023

Contributed by Lukas

Summary Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the co...

Build Better Tests For Your dbt Projects With Datafold And data-diff

11 Jun 2023

Contributed by Lukas

Summary Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be ...

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

04 Jun 2023

Contributed by Lukas

Summary A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps h...

A Roadmap To Bootstrapping The Data Team At Your Startup

29 May 2023

Contributed by Lukas

Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probabl...

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

21 May 2023

Contributed by Lukas

Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argu...

What Happens When The Abstractions Leak On Your Data

15 May 2023

Contributed by Lukas

Summary All of the advancements in our technology is based around the principles of abstraction. These are valuable until they break down, which is...

Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

07 May 2023

Contributed by Lukas

Summary Every business has customers, and a critical element of success is understanding who they are and how they are using the companies products...

Realtime Data Applications Made Easier With Meroxa

24 Apr 2023

Contributed by Lukas

Summary Real-time capabilities have quickly become an expectation for consumers. The complexity of providing those capabilities is still high, howe...

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

16 Apr 2023

Contributed by Lukas

Summary Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and b...

An Exploration Of The Composable Customer Data Platform

10 Apr 2023

Contributed by Lukas

Summary The customer data platform is a category of services that was developed early in the evolution of the current era of cloud services for dat...

Mapping The Data Infrastructure Landscape As A Venture Capitalist

03 Apr 2023

Contributed by Lukas

Summary The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track...

Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite

25 Mar 2023

Contributed by Lukas

Summary The promise of streaming data is that it allows you to react to new information as it happens, rather than introducing latency by batching ...

Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed

19 Mar 2023

Contributed by Lukas

Summary As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes...

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

10 Mar 2023

Contributed by Lukas

Summary With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that...

Exploring The Nuances Of Building An Intentional Data Culture

06 Mar 2023

Contributed by Lukas

Summary The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope a...

Building A Data Mesh Platform At PayPal

27 Feb 2023

Contributed by Lukas

Summary There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Pe...

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

19 Feb 2023

Contributed by Lukas

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limit...

Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

11 Feb 2023

Contributed by Lukas

Summary Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has be...

Reflecting On The Past 6 Years Of Data Engineering

06 Feb 2023

Contributed by Lukas

Summary This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have ...

Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics

30 Jan 2023

Contributed by Lukas

Summary Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analyst...

Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI

22 Jan 2023

Contributed by Lukas

Summary The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in th...

Building Applications With Data As Code On The DataOS

16 Jan 2023

Contributed by Lukas

Summary The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. ...

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

08 Jan 2023

Contributed by Lukas

Summary Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. ...

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

29 Dec 2022

Contributed by Lukas

Summary Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organiza...

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

29 Dec 2022

Contributed by Lukas

Summary With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head ...

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

26 Dec 2022

Contributed by Lukas

Summary Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operati...

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

26 Dec 2022

Contributed by Lukas

Summary Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and pra...

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

19 Dec 2022

Contributed by Lukas

Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction i...

Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle

19 Dec 2022

Contributed by Lukas

Summary The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of th...

Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee

12 Dec 2022

Contributed by Lukas

Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learn...

Run Your Applications Worldwide Without Worrying About The Database With Planetscale

12 Dec 2022

Contributed by Lukas

Summary One of the most critical aspects of software projects is managing its data. Managing the operational concerns for your database can be comple...

Business Intelligence In The Palm Of Your Hand With Zing Data

05 Dec 2022

Contributed by Lukas

Summary Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is thro...

Adopting Real-Time Data At Organizations Of Every Size

05 Dec 2022

Contributed by Lukas

Summary The term "real-time data" brings with it a combination of excitement, uncertainty, and skepticism. The promise of insights that are...

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

28 Nov 2022

Contributed by Lukas

Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This...

Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase

28 Nov 2022

Contributed by Lukas

Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information...

A Look At The Data Systems Behind The Gameplay For League Of Legends

21 Nov 2022

Contributed by Lukas

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal bus...

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

21 Nov 2022

Contributed by Lukas

Summary The problems that are easiest to fix are the ones that you prevent from happening in the first place. Sifflet is a platform that brings your ...

Taking A Look Under The Hood At CreditKarma's Data Platform

14 Nov 2022

Contributed by Lukas

Summary CreditKarma builds data products that help consumers take advantage of their credit and financial capabilities. To make that possible they ne...

Build Data Products Without A Data Team Using AgileData

14 Nov 2022

Contributed by Lukas

Summary Building data products is an undertaking that has historically required substantial investments of time and talent. With the rise in cloud pl...

Build Better Data Products By Creating Data, Not Consuming It

07 Nov 2022

Contributed by Lukas

Summary A lot of the work that goes into data engineering is trying to make sense of the "data exhaust" from other applications and service...

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

07 Nov 2022

Contributed by Lukas

Summary Despite the best efforts of data engineers, data is as messy as the real world. Entity resolution and fuzzy matching are powerful utilities f...

Expanding The Reach of Business Intelligence Through Ubiquitous Embedded Analytics With Sisense

31 Oct 2022

Contributed by Lukas

Summary Business intelligence has grown beyond its initial manifestation as dashboards and reports. In its current incarnation it has become a ubiqui...

Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

30 Oct 2022

Contributed by Lukas

Summary One of the most impactful technologies for data analytics in recent years has been dbt. It’s hard to have a conversation about data eng...

How To Bring Agile Practices To Your Data Projects

23 Oct 2022

Contributed by Lukas

Summary Agile methodologies have been adopted by a majority of teams for building software applications. Applying those same practices to data can pr...

Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB

23 Oct 2022

Contributed by Lukas

Summary The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nea...

An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

16 Oct 2022

Contributed by Lukas

Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction suppo...

Speeding Up The Time To Insight For Supply Chains And Logistics With The Pathway Database That Thinks

16 Oct 2022

Contributed by Lukas

Summary Logistics and supply chains are under increased stress and scrutiny in recent years. In order to stay ahead of customer demands, businesses n...

Making The Open Data Lakehouse Affordable Without The Overhead At Iomete

10 Oct 2022

Contributed by Lukas

Summary The core of any data platform is the centralized storage and processing layer. For many that is a data warehouse, but in order to support a d...

Investing In Understanding The Customer Journey At American Express

10 Oct 2022

Contributed by Lukas

Summary For any business that wants to stay in operation, the most important thing they can do is understand their customers. American Express has in...

Gain Visibility And Insight Into Your Supply Chains Through Operational Analytics Powered By Roambee

03 Oct 2022

Contributed by Lukas

Summary The global economy is dependent on complex and dynamic networks of supply chains powered by sophisticated logistics. This requires a signific...

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

03 Oct 2022

Contributed by Lukas

Summary Data lineage is something that has grown from a convenient feature to a critical need as data systems have grown in scale, complexity, and ce...

Power Your Real-Time Analytics Without The Headache Using Fivetran's Change Data Capture Integrations

26 Sep 2022

Contributed by Lukas

Summary Data integration from source systems to their downstream destinations is the foundational step for any data product. With the increasing expe...

Build A Common Understanding Of Your Data Reliability Rules With Soda Core and Soda Checks Language

26 Sep 2022

Contributed by Lukas

Summary Regardless of how data is being used, it is critical that the information is trusted. The practice of data reliability engineering has gained...

Operational Analytics To Increase Efficiency For Multi-Location Businesses With OpsAnalitica

19 Sep 2022

Contributed by Lukas

Summary In order to improve efficiency in any business you must first know what is contributing to wasted effort or missed opportunities. When your b...

Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

19 Sep 2022

Contributed by Lukas

Summary There is a constant tension in business data between growing siloes, and breaking them down. Even when a tool is designed to integrate inform...

Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

12 Sep 2022

Contributed by Lukas

Summary Data engineering systems are complex and interconnected with myriad and often opaque chains of dependencies. As they scale, the problems of v...

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

12 Sep 2022

Contributed by Lukas

Summary Any business that wants to understand their operations and customers through data requires some form of pipeline. Building reliable data pipe...

A Reflection On Data Observability As It Reaches Broader Adoption

05 Sep 2022

Contributed by Lukas

Summary Data observability is a product category that has seen massive growth and adoption in recent years. Monte Carlo is in the vanguard of compani...

Introduce Climate Analytics Into Your Data Platform Without The Heavy Lifting Using Sust Global

05 Sep 2022

Contributed by Lukas

Summary The global climate impacts everyone, and the rate of change introduces many questions that businesses need to consider. Getting answers to th...

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

29 Aug 2022

Contributed by Lukas

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines ar...

Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

28 Aug 2022

Contributed by Lukas

Summary AirBnB pioneered a number of the organizational practices that have become the goal of modern data teams. Out of that culture a number of suc...

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

22 Aug 2022

Contributed by Lukas

Summary Data has permeated every aspect of our lives and the products that we interact with. As a result, end users and customers have come to expect...

Understanding The Role Of The Chief Data Officer

22 Aug 2022

Contributed by Lukas

Summary The position of Chief Data Officer (CDO) is relatively new in the business world and has not been universally adopted. As a result, not every...

Bringing Automation To Data Labeling For Machine Learning With Watchful

14 Aug 2022

Contributed by Lukas

Summary Data engineers have typically left the process of data labeling to data scientists or other roles because of its nature as a manual and proce...

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

14 Aug 2022

Contributed by Lukas

Summary Data is useless if it isn’t being used, and you can’t use it if you don’t know where it is. Data catalogs were the first so...

Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab

06 Aug 2022

Contributed by Lukas

Summary Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural patter...

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

06 Aug 2022

Contributed by Lukas

Summary The optimal format for storage and retrieval of data is dependent on how it is going to be used. For analytical systems there are decades of ...

Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda

31 Jul 2022

Contributed by Lukas

Summary Exploratory data analysis works best when the feedback loop is fast and iterative. This is easy to achieve when you are working on small data...

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

31 Jul 2022

Contributed by Lukas

Summary Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model,...

Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster

24 Jul 2022

Contributed by Lukas

Summary The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being...

Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering

24 Jul 2022

Contributed by Lukas

Summary Data engineering is a difficult job, requiring a large number of skills that often don’t overlap. Any effort to understand how to start...

Making The Total Cost Of Ownership For External Data Manageable With Crux

17 Jul 2022

Contributed by Lukas

Summary There are extensive and valuable data sets that are available outside the bounds of your organization. Whether that data is public, paid, or ...

«« ← Prev Page 2 of 6 Next → »»